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A drop in the ocean 


Wave power and other renewable-energy resources deserve carefully targeted government support. 


f the world is to wean itself off fossil fuels, a wide range of alterna- 
tive energy sources will have to be brought into play. The geo- 
graphically dispersed nature of renewable resources, including 
power from solar, wind, wave, tidal and geothermal sources, under- 
scores the need for different nations to develop viable alternatives that 
utilize the resources they are best placed to exploit. 

But some technologies are struggling to make their 
mark. The harnessing of wave power, for example, has 
so far had mixed results. This renewable resource held 
considerable currency in some territories during earlier 
energy crises, but it has yet to make any real contribution 
to the global energy mix (see page 156). After the energy 
crises of 1974 and 1979, nations in the stormy northern Atlantic 
Ocean, including Britain and Norway, set up relatively modest pro- 
grammes to explore wave power. But faced with assessments suggest- 
ing that the costs of wave power were unlikely to fall quickly enough 
to render it competitive, government backing for wave energy was 
all but abandoned. 

Now the energy crisis is back with a bang, and numerous privately 
run companies around the world are testing wave-power devices, 
many of them developed in collaboration with university research- 
ers. All of the designs face common obstacles. They will need to 
survive in a physically hostile and corrosive environment, which 
will sometimes subject them to forces ten or twenty times as great as 
those they need for normal operation. And although economies of 
scale will reduce the costs of wave-power plants, such reductions are 
likely to follow the unspectacular trajectories enjoyed by, say, build- 
ers of marine engines, rather than the spectacular leaps achieved by 
manufacturers of silicon chips. 

These are the considerations that have, in effect, relegated wave 
power to a ‘second tier’ of renewable-energy resources that do not 
attract substantial public- or private-sector backing. Yet there is a strong 
argument, given the grim outlook for the world’s energy supply, that 
such support should be forthcoming so that the commercial viability of 


“Wave power has 
been relegated to 
a‘second tier’ of 
renewable-energy 
resources.” 


the more promising wave technologies can be examined more fully. 

The London-based Carbon Trust, a company set up by the UK 
government to promote a low-carbon economy, has identified wave 
energy as one of Britain’s most promising renewable resources, with 
the potential to provide up to 20 gigawatts of power by 2050. But the 
trust estimates that it could cost £2.2 billion (US$4.6 
billion) in development to reduce the cost of wave-gen- 
erated electricity from current estimates of between 12p 
and 44p to a competitive 6p per kilowatt-hour. 

That sum may seem daunting to the British govern- 
ment acting on its own; but in global terms, it ist much. 
The Carbon Trust estimates, for example, that Denmark 
has so far spent £1.3 billion on the development of wind power. The 
Japanese government has invested at least £1 billion in solar power. 
And don't mention it to the nuclear lobby, but the amount of public 
money invested to make atomic power fit-for-purpose was orders of 
magnitude higher. 

Both governments and private investors, of course, need assurance 
that any wave-power technologies they decide to support will have 
some worth. To gauge the potential of different designs, it can be 
valuable for backers of rival technologies to benchmark prototype 
equipment and compare it objectively with the competition. 

A promising model in this regard is the European Marine Energy 
Centre in the Orkney Islands in Scotland, a testing site set up in 2003 
that receives support from Edinburgh, London and Brussels. The 
centre helps private companies to test their wave-power designs. One 
firm, Edinburgh-based Pelamis, has already tested and improved 
its design at the centre, and four more are expected to do so in the 
next two years. 

Such benchmarking can, of course, get wave energy only so far. At 
some stage, it will have to take its chances on the open market. But 
in the meantime, governments whose coastlines may be suitable for 
wave energy should support promising technologies to an extent that 
will at least allow for a firmer measure of their viability. rT] 


The great divide 


The gap between theory and practice remains 
surprisingly wide in conservation biology. 


gists because they yearn for riches and fame, for swimming 
pools or caviar. They decide to become conservation biolo- 

gists because they want to stop species from becoming extinct. 
So it can sometimes come as a surprise for outsiders to learn how 
far removed the conservation biologist often is from actual efforts 
to save species. Most of the time, conservation biologists describe 


M en and women do not decide to become conservation biolo- 


problems, float solutions, prioritize areas and actions, and run com- 
puter models of natural ecosystems. They are cartographers of cri- 
ses, producing demoralizing maps of threat and extinction. They are 
adept at coming up with ever-better methods of doing more with less 
— at least in theory (see page 152). 

It generally falls to a separate and amorphous group, known as 
‘practitioners, to buy land, put up fences, set fires, put out fires, lobby 
politicians, negotiate with farmers, spray invasive weeds, poison rats 
and guard against poachers. These people are generally not conser- 
vation biologists: they are civil servants, environmental consultants, 
park managers or environmental lobbyists. 

The distance between these two groups creates a sometimes-yawn- 
ing ‘implementation gap’ between theory and practice. Conservation 
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biologists write and publish papers, which the practitioners seldom 
read. The practitioners, in turn, rarely document their actions or 
collate their data in forms useful to conservation biologists. Typi- 
cally, practitioners make decisions based on personal experience and 
intuition. Their knowledge stays untapped by others — and can be 
impervious to fresh scientific findings. 

The existence of this gap has been acknowledged, and numerous 
efforts are already directed at bridging it. Some publications try to 
bring scientific news to practitioners. William Sutherland, a conser- 
vation biologist at the University of Cambridge, UK, runs a site called 
ConservationEvidence.com where practitioners are encouraged to 
deposit reports on the outcomes of their interventions — successful 
or otherwise. Data from these reports can then be fed into systematic 
reviews of the kind being done by Andrew Pullin at Bangor Uni- 
versity in Wales, whose Centre for Evidence-Based Conservation 
attempts to answer questions such as ‘are Japanese knotweed control 
and eradication interventions effective?’ 

There have been many calls for more mid-career training of prac- 
titioners. Conservation biologists could run workshops, and squeeze 
in some much-needed interaction with their peers on the application 
side of the discipline. The need for this may sound obvious — but in 


a field so cash-strapped that many conservation projects can’t even 
afford to assess their own effectiveness afterwards, it sometimes 
seems like a luxury. 

Local and national governments with a stake in conservation 


should be encouraged to support “What is neededisa 

such training as a cost-effective 

means of raising the efficiency of the concerted effort by both 

conservation projects on their turf academic scientists and 
practitioners to get out 


— an objective that constituents at 
both ends of the political spectrum of thejr respective ruts.” 


are liable to support. 

But the gap can also be bridged if conservation biologists remem- 
ber to look at all of their professional activities in light of their interest 
— be it practical, moral, aesthetic or even humanitarian — in sav- 
ing species from extinction. In essence, the more time that they can 
spend working with local practitioners on real conservation issues 
the better. 

What is needed is a concerted effort by both academic scientists 
and practitioners to get out of their respective ruts, open up paths 
of communication, share information and seek ever more efficient 
means to acommon end. 7 


Deadly consequences 


Health authorities have yet to respond effectively to 
the combination of HIV and tuberculosis. 


controllable, and has been so for decades. So it is appalling that 

the disease is currently flaring up around the world in an epi- 
demic of co-infection with HIV, which is also associated with a fright- 
ening increase in strains of TB that are resistant to existing drugs. 

This week, the 38th Union World Conference on Lung Health con- 
venes in Cape Town, South Africa. The main themes of the meeting 
will be the challenges of HIV-TB co-infection and multiple-drug 
resistance in TB. 

The importance of co-infection has been emerg- 
ing steadily, especially in Africa, since the early days 
of the AIDS pandemic. TB is now the most common 
opportunistic infection in HIV-positive patients 
starting antiretroviral therapy. Such co-infection 
presents particularly troubling complications for 
treatment: there are overlapping drug toxicities and 
the risk of a life-threatening inflammatory syndrome if infection status 
is unknown and treatment administered incorrectly. 

The South African city of Tugela Ferry presents a startling example 
of how an HIV-TB epidemic could play out. The incidence of TB 
there is very high, and of some 400 multidrug-resistant cases identified 
since 2006, more than half were classified as extensively drug resist- 
ant, meaning that they are resistant to second-line as well as first-line 
drug treatments. Most of the resistant infections occur in individuals 
co-infected with HIV. Efforts to manage both diseases in patients may 
itself encourage the emergence of drug-resistant strains. 


Tena (TB) is not only completely treatable, it is curable and 
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“Researchers, doctors and 
health-care workers need 
to do far more to respond 
to the scale of the problem 
that TB and co-infection 
with HIV presents.” 


Activists and health-care workers have often sought to blame the 
South African government for its lax response to this crisis. But it 
has also been aggravated by an unfortunate historical divide in the 
worlds of research and health care between those addressing TB and 
those tackling AIDS (see Nature 446, 109-110; 2007). Researchers, 
doctors, health-care workers and the entities that support them need 
to do far more to respond to the scale of the problem that TB presents, 
and its interconnectedness with HIV. Priorities outlined in 2004 by 
the World Health Organization for HIV/TB research have not been 
implemented adequately, according to a report released by the Forum 
for Collaborative HIV Research last week. 

Large parts of sub-Saharan Africa are becoming subsumed by co- 
infection. And although the rate of infection has dropped elsewhere, 
many European and Asian nations still face large numbers of patients 
with active TB infections. A report from the US 
Centers for Disease Control and Prevention last 
month showed that the phenomenon may presenta 
threat in the United States as well (Morbid. Mortal. 
Wkly Rep. 56, 1103-1106; 2007). One-third of TB 
patients there didn't know their HIV status, despite 
official policy that routine testing be performed on 
everyone with TB. And 9% of those with TB were 
also HIV positive, according to the report. 

The global co-infection epidemic is all the more troubling because 
it was potentially avoidable with better use of existing drugs. The 
rising incidence of drug-resistant TB is now forcing agencies in 
Africa and around the world to react to the scale of the problem. 
The list of needs is a familiar one: better delivery of existing care 
approaches, development of more useful diagnostics, and commu- 
nity-based care. But a bigger mental shift is needed in recognizing 
the size of the problem and its interconnectedness with the AIDS 
pandemic. a 
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Go with the flow 
J. Geophys. Res. 112, GO4S58 (2007) 


Rising temperatures could increase nitrogen 
and phosphorus in the waters off Siberia’s coast, 
altering local biological productivity, a study 
suggests. 

Rivers that drain western Siberia’s peatlands 
(pictured, right) wash nutrients into Arctic waters, 
and previous work has shown that global warming 
could lead to more dissolved organic carbon being 
carried north. Karen Frey at Clark University in 
Worcester, Massachusetts, and her colleagues 
sampled 96 Siberian streams. They estimate that 
levels of dissolved nitrogen and phosphorus, which 
also make the journey north, could both increase 


by between about 30% and 50% by 2100. 
The extra nutrients are likely to rev up 
photosynthetic production in the Ob’ and 

Yenisey bays. 


Miracle grow 


Science 318, 772-777 (2007) 
Salamanders have the phenomenal ability 
to regenerate a lost limb from a mound of 
stem cells — or ‘blastema — that forms at 
the wound site, but only if nerves are present. 
A protein called anterior gradient (AG) 
bypasses this requirement and offers promise 
for regenerative medicine. 

While examining AG expression 
at amputation sites in Notophthalmus 
viridescens, a type of salamander, Jeremy 
Brockes of University College London and 
colleagues discovered that when there's 
no nerve, there's no AG. Strikingly, the 
introduction of AG into denervated blastemas 
rescued limb regeneration (pictured below), 


although the limbs were not fully functional. 
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If blastemas could be engineered in 
mammals, propose the authors, then growth- 
promoting proteins such as AG could be used 
to trigger limb regrowth. 


A predator for lunch 


Proc. R. Soc. B doi:10.1098/rspb.2007.1170 (2007) 
Fossil-hunters in Germany have dug up an 
amazing find — the oldest known example of 
a food chain with three links, or ‘trophic levels. 
The find, which dates back almost 300 million 
years, consists of the remains of a prehistoric 
fish, which was eaten by an amphibian, which 
in turn was gobbled up by a primitive shark. 
The fossil record contains few documented 
predator-prey relationships, because digested 
remains in the gut of larger animals tend 
not to be preserved, explain researchers led 
by Jiirgen Kriwet of Berlin’s Natural History 
Museum. The fossil shark, a member of the 
species Triodus sessilis that was found near 
Saarbriicken in southwest Germany, actually 
contained the remains of two amphibian 
species, one of which had already feasted on a 
fish called Acanthodes bronni. 


Bigger galaxies earlier 


Astrophys. J. 669, 184-201 (2007) 

Giant galaxies seem to have formed earlier 
than models suggest, say the authors of a new 
survey. 

Roberto Abraham of the University of 
Toronto in Canada and his colleagues used 
images from the Gemini Deep Deep Survey 
and Hubble Space Telescope to study 144 
galaxies between 3 billion and 6 billion years 
old. By examining the concentration of 
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starlight in each pixel of the images, the team 

was able to classify the galaxies by shape. 
The results show that, contrary to some 

predictions, most of the early Universe's 

stars resided in large elliptical galaxies. 

The findings will help astronomers rethink 

models of galaxy formation. 


Inside story 


Nature Nanotechnol. doi:10.1038/nnano.2007.347 
(2007) 
How toxic are carbon nanotubes? That's 
one of the pressing questions in assessing 
possible risks of nanotechnology, which has 
applications in medicine. But the matter is 
hard to study at the cellular level, because it is 
tricky to spot nanotubes entering cells. Unless 
they are fluorescently labelled, the carbon 
tubes are hard to distinguish from carbon- 
based cell structures such as membranes. 
Using a combination of electron and 
optical microscopy, Alexandra Porter at the 
University of Cambridge, UK, and her team 
have now obtained clear evidence of single- 
walled nanotubes — which are only 0.6-3.5 
nanometres in diameter — entering human 
cells. They found that, once inside cells, 
nanotubes accumulate in the cell cytoplasm 
and nucleus, where they cause cell death. 


RNAi on the offensive 


Nature Biotechnol. doi:10.1038/nbt1352; 10.1038/ 
nbt1359 (2007) 
A new and easy way to fight plant pests using 
RNA interference (RNAi) has been suggested 
by two groups working independently. 

RNA interference occurs when short 
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pieces of RNA are introduced into a cell, 
where they bind to a target RNA sequence, 
decreasing the expresion of that sequence 
and of its encoded protein. Xiao-Ya Chen 
and colleagues at the Shanghai Institutes 

for Biological Sciences in China used this 
technique to target an enzyme in cotton 
bollworms that confers resistance to the 
cotton plant’s chemical defences. When plant 
leaves containing a trigger RNA were fed to 
the cotton bollworms, the worms were unable 
to make as much of their defensive enzyme 
against the cotton toxin, and their growth 
was stunted. 

James Roberts of Monsanto Company in 
Chesterfield, Missouri, led a team that took 
its experiments all the way to the field. The 
group engineered corn plants that express 
short RNAs targeted against an essential 
enzyme found in the western corn rootworm. 
They then allowed corn rootworm larvae 
to dine on engineered and non-engineered 
plants for three weeks, and found that the 
engineered plants had much less root damage 
(pictured below, right) than their unprotected 
counterparts (left). 


JOURNAL CLUB 


Brian J. Enquist 


how important is this biological 
feedback to how ecosystems 


CHEMISTRY 


Green cleaver 


Science 318, 783-787 (2007) 

An iron compound that can selectively 
break carbon-hydrogen bonds in organic 
compounds looks set to pave the way for 
easier — and greener — syntheses. 

The carbon-hydrogen bond is ubiquitous 
in organic molecules. Breaking it open to add 
other chemical groups generally requires a 
catalyst. Often, in more complex molecules, 
the bond has to be made more reactive 
and other parts of the molecule need to be 
shielded from activity before the catalyst can 
do its job. Both these steps involve potentially 
toxic reagents. 

Christina White and Mark Chen at the 
University of Illinois in Urbana have unveiled 
an iron catalyst that can oxidize carbon- 
hydrogen bonds using only hydrogen 
peroxide, which is relatively benign. The 
catalyst can target specific bonds, even in 
complicated molecules. For each molecule, 
this selectivity is based largely on the inherent 
reactivity of the bonds and how accessible 
the bonds are to 
the catalyst. 


GENETICS 
Light release 


Nature Chem. 

Biol. doi10.1038/ 
nchembio.2007.44 
(2007) 

Manipulation of the 
genetic code has 
allowed researchers 
in San Diego, 
California, to produce 
proteins in which the 
amino acid serine is 


know that all these processes are 
affected by changes in climate. 


RESEARCH HIGHLIGHTS 


‘photocaged’ Changes to the genetic coding 
and translational mechanisms in the yeast 
Saccharomyces cerevisiae can be used to 
produce proteins in which an extra chemical 
group masks a specific serine residue, report 
Peter Schultz and his colleagues at the Scripps 
Research Institute and the Novartis Research 
Foundation. The masking group can later be 
removed by exposure to visible light. 

By selectively illuminating such cells, and 
thus choosing when to expose the serine 
residues, the researchers were able to study 
the circumstances under which Pho4, a 
transcription factor, is phosphorylated. 

They suggest that this means of exerting fine 
control over protein function in vivo could 
have wide applicability, and expect in time to 
apply it to other amino acids and cell types. 


MICROBIOLOGY 


Divide and conquer 


Chem. Biol. 14, 1119-1127 (2007) 
Researchers have found a new treatment 
that fights Staphylococcus aureus infections 
in mice by shutting down lines of 
communication among bacterial cells. 
Antibiotic-resistant forms of S. aureus 
pose an escalating public health threat. 
Kim Janda and his colleagues at the Scripps 
Research Institute in La Jolla, California, 
report a new type of antibiotic: an antibody 
that binds to a signalling molecule S. aureus 
use to communicate with each other. This 
communication, known as quorum sensing, 
regulates the production of some proteins 
associated with virulence. 
The antibody reduced production of one 
such protein, a-haemolysin, and inhibited the 
breaking apart of red blood cells in bacterial 
cultures. It also prevented S. aureus-induced 
skin lesions in mice, and fully protected mice 
against lethal doses of the bacterium. 


and respiration to changing 
temperature. Remarkably, they find 


University of Arizona, Tucson, 
Arizona USA 


An ecologist wonders how 
biotic feedback matters to 
global-change research. 


| have increasingly been drawn 

to the question of how the biotic 
world responds to climatic change. 
In the face of environmental 
change, biology responds — 
organisms often compensate, 
adapt and change the nature 

of their ecologies. But exactly 


respond to a warmer world? 

My colleagues and | have called 
for a need to focus on quantifying 
the importance of what we call the 
three As — acclimation, adaptation 
and assembly — on ecosystem- 
level processes such as carbon flux. 

Acclimation is a plastic response 
by an organism to a change in the 
environment, whereas adaptation 
is the end result of natural selection 
in populations. Assembly is how 
species come to dominate a local 
environment and is the result 
of ecological interactions. We 


The end result of the three Asis a 
group of species that live in a given 
location and control the flow of 
resources and energy. 

These processes operate on 
differing time scales and have 
mostly been studied in isolation. 
However, two fascinating papers 
(K. Ishikawa et al. New Phytol. 176, 
356-364; 2007, and C. Campbell 
et al. New Phytol. 176, 375-389; 
2007) assess the role of both 
acclimation processes and 
between-species adaptation in 
the responses of photosynthesis 
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that acclimation and adaptative 
responses seem to compensate 
for temperature-driven changes in 
carbon flux. 

Putting these two As together 
with how species assemble in 
ecological communities will 
probably reveal generalities in how 
evolutionary biology and plant- 
community ecology matters in 
global change. 


Discuss these papers at http:// 
blogs.nature.com/nature/ 
journalclub 
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Foreign students face 
extra UK security checks 


The British government has quietly intro- 
duced a programme of security checks on 
foreign students coming to the United Kingdom 
for graduate studies in the sciences and 
engineering. 

The Academic Technology Approval Scheme 
(ATAS) began on 1 November. It requires all 
graduate students from outside the European 
Economic Area and Switzerland to complete 
an online questionnaire if they intend to study 
any of a broad range of scientific disciplines, 
including biology, physics, chemistry and math- 
ematics. The questionnaire, which includes 
questions about family background, must be 
vetted and approved by UK security agencies 
before students are allowed to apply for visas to 
enter the country. The list of disciplines includes 
41 subject areas, and the government estimates 
that some 23,000 students will be affected. 

The screenings are designed 
to prevent the spread of sen- 
sitive knowledge to foreign 
nationals, according to a 
spokesman for the UK For- 
eign & Commonwealth Office 
(FCO) who asked not to be identified. Sensitive 
fields such as nuclear physics and microbiology 
could easily be turned to malicious purposes, 
he says. “You can think of half a dozen coun- 
tries where you don’t want this technology get- 
ting into the wrong hands,” he adds. 

Some researchers expressed scepticism about 
the plan. The vast majority of academic research 
lies in the public domain, and it remains unclear 
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how the government will decide who should 
have access to “sensitive” subjects. 

“This is not a very intelligent scheme,” says 
Peter Littlewood, chair of the physics depart- 
ment at the University of Cambridge. “It seems 
unlikely to make a positive contribution to 
security, and for most students will be an extra 
hoop to jump through that will encourage them 
to go elsewhere.” 

The ATAS replaces a system of voluntary 
reporting by UK universities. Under that system, 
individual schools notified the FCO if they sus- 
pected a student of pursuing a sensitive subject 
for improper reasons. The government decided 
voluntary reporting was insufficient, accord- 
ing to the FCO spokesman. “The system now is 
going to be much more robust,’ he says. 

It will also mean more paperwork for 
researchers. As part of the ATAS application, 

departments will have to pro- 
vide a brief summary of the 
intended course of study. “This 
is something ofa change asa stu- 
dent doesn't know [their precise 
topic] until after they arrive,” 
says Robert Hay, academic secretary of the phys- 
ics department at Cambridge. Hay adds that 
Cambridge is developing an online system to 
help students get their summaries quickly. 

Critics say the screenings are unfair. “This 
new screening system treats international stu- 
dents with undue suspicion,” claims Gemma 
Tumelty, president of Britain’s National Union 
of Students. 


AUS entry 
screening 
programme was 
blamed for a dip 
in the number 

of academics 
visiting the United 
States after the 11 
September 2001 
terror attacks. 
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“We might be denying opportunities for 
genuine students,’ adds Ali Alhadithi, president 
of the Federation of Student Islamic Societies 
in the UK & Ireland. 

Not all researchers contacted by Nature 
were aware of the scheme, but some said they 
would not object to it as long as it didn’t take 
undue time and interfere with the inflow of 
foreign students. “Clearly, there have been 
certain [security] worries with certain tech- 
nologies,” says Neil Ferguson, a mathematical 
modeller of infectious disease at Imperial Col- 
lege London. “If they can approve students in a 
week or two, then fine” 

But a similar American programme that was 
tightened up in the wake of the terror attacks 
of 11 September 2001 caused significant delays. 
Like the ATAS, the US programme required stu- 
dents whose subjects appeared on a “Technology 
Alert List’ to undergo further screening. The 
system, which required screening by many US 
security services, quickly became overwhelmed 
by applicants, and in its first years, many scien- 
tists experienced delays of months. 

Observers blame the delays for a dip in the 
number of academics visiting the United States 
in the ensuing years, with universities in Britain 
and elsewhere benefiting through increases in 
applications from foreign students (see Nature 
427, 190-195; 2004). “Some of the science and 
engineering departments at major research uni- 
versities in the United States saw a big impact,’ 
says Peggy Blumenthal, executive vice-presi- 
dent of the Institute of International Education, 
a non-profit agency based in New York that 
monitors the flow of international students. 

Any such problems could seriously affect 
both the research enterprise and the financial 
situation at UK universities. “International 
students are an important source of income,” 
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Anew UK scheme screens some foreign students 
before they can apply for entry visas. 


says Bruce Nelson, chair of the Association of 
University Administrators, a higher-education 
group involved in the government consultation 
on the ATAS programme. Foreign students pay 
full fees of around £10,000 (US$21,000) a year 
to study in Britain, Nelson says. 

The US scheme is still in place, but delays 
have been largely ameliorated by increases in 
staff, automation of the system and the exten- 
sion of clearances for up to four years. 

The UK government has assured the univer- 
sity community that students will not be hit by 
delays, according to Dominic Scott, chief execu- 
tive of the UK Council for International Student 
Affairs, which promotes international student 
mobility. The screening will be free to appli- 
cants and require no additional documentation 
beyond a brief summary of their studies, Scott 
says. Most importantly, the FCO has promised 
to process applicants quickly, he explains. “We 
are assured that the vast majority will receive 
their answers within seven to ten days.” 

But Alhadithi remains concerned that the 
screening system could unfairly hinder students 
whose primary goal is simply to improve life in 
their home countries. The FCO has provided 
little information on how it plans to distinguish 
the few potential security risks from the major- 
ity of hard-working students, he says. “We need 
more information about the triggers: what 
exactly is going to cause a red mark?” 

“This is not an area where sane discussion 
prevails at the moment,” Littlewood says. 
“We will do our best to smooth the process 
of recruiting foreign students while working 
within the rules.” 

Geoff Brumfiel 
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Excessive fat intake can 
throw out the body clock 


The body's circadian rhythm — the internal 
‘clock’ that regulates physiological processes 
—can be shifted by eating more fat, say 
researchers’. This surprising finding suggests 
that a more complex interplay exists between 
the body clock and metabolism, with 
implications for disorders such as diabetes 
and obesity. 

The circadian rhythm is a near-24-hour cycle 
that is known to be modulated by sunlight 
and eating schedules. Previous studies have 
shown that a disrupted circadian rhythm leads 
people to crave high-fat foods. And a study 
out this week shows that children who lack 
sleep risk being overweight”. This is an issue 
of increasing concern as researchers attempt 
to elucidate the link between disturbances in 
circadian rhythm and health conditions such as 
obesity, heart disease and diabetes’. 

Joseph Bass, an endocrinologist at 
Northwestern University in Evanston, Illinois, 
fed a group of male mice a diet in which 45% 
of the calories were derived from fat, and 
monitored their daily wheel-running schedule. 
Mice given high-fat food had 23.8-hour daily 
cycles, whereas the body clock in control 
mice, whose caloric intake included only 16% 
fat, was 23.6 hours long’. The internal time 
change occurred before the mice had gained 
any weight, although the researchers did not 
measure changes in body-fat percentage. 

“This is the first time that a paper has 
really shown the impact of feeding on the 
molecular and behavioural expression of the 
circadian rhythm,” says Eve Van Cauter, a sleep 
researcher at the University of Chicago in 
Illinois, who was not affiliated with the study. 
“Ina human, this would mean the person 
would have increased difficulties going to bed 
at a reasonable time," she says. “That might 
result in insomnia or night-eating", which 
further boost the risk of obesity and 
diabetes. 

The link between circadian 
rhythm and metabolism is not 
surprising, says Bass, because the 
two systems share many molecular 
signalling pathways. The expression 
patterns of some genes involved in 
lipid metabolism change in 24-hour 
cycles, and several nuclear receptors 
that are activated by sterols regulate 
expression of clock-related genes’. 

In addition, mice bearing mutations in 
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the circadian-rhythm gene Clock show signs of 
metabolic dysregulation, including obesity and 
altered expression of genes involved in appetite 
regulation’. 

Precisely how a fatty diet could disturb the 
circadian clock remains elusive, but the hunt is 
on. “In discovering these molecular switches 
that couple metabolic and circadian systems, 
we might actually uncover new pathways or 
targets to alter metabolic state," says Bass. 
Research has shown that the activity of two 
clock-regulating proteins depends on the 
nutrient status of the cell®, providing one 
possible molecular connection, he says. 

Others note that the connection between 
nutrient status and clock length may be 
indirect. The eating habits of mice on the 
high-fat diet also altered — they ate more and 
consumed more calories during the day, when 
mice normally sleep. It could be that the change 
in eating habits, rather than a direct effect of 
specific nutrients, altered their body clocks, 
says Hitoshi Ando at Kanazawa University in 
Japan. Ando has studied the impact of fatty 
food on the body clock in female mice, but 
found only a minimal effect’. The reason for that 
discrepancy is unclear, he says. 

Heidi Ledford 
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Attack of the genomes 


How many genome sequences do you need to characterize a model organism? 
For Drosophila, Heidi Ledford finds, a dozen is a good start. 


here was a time not so long 
Tt ago when sequencing a sin- 

gle genome was cause for 
celebration. If that genome was 
from a eukaryote, so much the 
better. A multicellular eukaryote? 
Then break out the champagne. 

The bar has now been raised 
even higher with the publication in 
this issue of full genome sequences 
from, not one, but ten fruitfly spe- 
cies, to add to the two sequenced 
previously’”. Getting the genome 
sequence for one’s favourite organ- 
ism is still an achievement, but 
researchers are realizing that to 
truly understand how genomes 
function and evolve, they need 
points of comparison. 

The Drosophila research com- 
munity is not the only one benefit- 
ing from comparative genomics. 
More than 20 vertebrate genomes 
have been published or are being 
sequenced, and more are on the 
way, in projects often funded solely 
with the aim of using the sequences 
to improve understanding of the 
human genome. 

Daniel Hartl, a geneticist at 
Harvard University who stud- 
ies vineyard yeasts as well as 
Drosophila, says that he has lost 
count of how many yeast genomes 
have been sequenced. “I don't 
even keep track. It’s like the Broad 
Institute sequences one of these 
before breakfast,” he says, refer- 
ring to the genomics centre in 
Cambridge, Massachusetts. The 
genomes of 22 species of yeast have 
been published, with another four 
on the way, bringing the number 
of sequenced fungal genomes to 
more than 60. 

Given the plummeting price 
and escalating power of sequencing 
technology, researchers can now afford to be 
a little greedy. “At one stage even I thought: 
“This is ridiculous, we're getting more and 
more genomes,” says Greg Elgar, a vertebrate 
genomicist at Queen Mary, University of 
London. “But it really does give us new insight 
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into evolutionary processes.” 

By placing these vast repositories of informa- 
tion side by side, sequence-gazers have been able 
to trace the evolution of genomes. It’s no sleepy 
drama. Chromosomes fragment and rejoin 
in different orientations, and entire genomes 
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have duplicated themselves. Genes 
have been tossed from region to 
region, sometimes coming under 
the influence of new regulatory ele- 
ments that alter the time and place 
of the genes’ expression. These 
chaotic changes only begin to 
explain the vast phenotypic differ- 
ences between related species (see 
‘Dew-loving all-stars’). Many of the 
genomic changes had been inferred 
from laboratory experiments and 
snippets of sequence, says Antonis 
Rokas, a geneticist at Vanderbilt 
University in Nashville, Tennessee, 
but these whole-genome studies 
provide the full script and confirm 
the story of genomic rearrange- 
ment. Nevertheless, inferring the 
source of the turmoil remains a 
challenge. “There are many differ- 
ent paths by which you can create 
ascrambled genome,’ says Rokas. 
“Identifying the most likely path is 
very challenging.” 

The immediate benefit of 
comparing genome sequences 
is the increased precision with 
which researchers can reveal the 
sequences that have been care- 
fully preserved over time, imply- 
ing that they have an important 
role in the organism’. Alterna- 
tively, these comparisons can 
pinpoint sequences that differ in 
just one species or a group of spe- 
cies. Subsequent lab experiments 
can determine whether and how 
those sequences — and not all of 
them are protein-coding genes 
— yielded a behavioural or mor- 
phological trait unique to that 
group, translating the genomes’ 
most mysterious bits. “Sequences 
don't come with an index,” says 
Hartl. “We don’t really know what 
the sequences mean” 

Even the relatively easy sequences to sort 
out — those that code for protein and are 
traditionally thought of as genes — can be 
challenging to nail down. The Drosophila 
melanogaster genome, sequenced in 2003 
(ref. 4), nearly a century after Thomas Hunt 
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Morgan characterized his first white-eyed 
mutant, has a stellar reputation for ‘annota- 
tion, as members of the community have sup- 
plemented computer-driven predictions with 
expert annotation of potential genes. Yet anal- 
ysis of the 12 Drosophila genomes surprised 
observers by revealing hundreds of protein- 
coding genes in D. melanogaster that had been 
either misannotated or missed completely. 

Still more challenging has been to determine 
which regions of the genome regulate when and 
where genes are expressed — giving the stage 
directions for building a living 


be affected by a gene’s location in the genome, 
and by chemical modifications to the DNA 
that surrounds it. 

“When the genomes started coming out, a 
lot of people thought they could track the regu- 
latory code just by comparing sequences,’ says 
Nicolas Gompel, a developmental biologist at 
the Institute of Developmental Biology at Mar- 
seille-Luminy, France. “That would have been 
really nice, but unfortunately it doesn’t work,’ 
he says. “You do find patterns, but they're not 
necessarily relevant.’ Large-scale projects such 
as ENCODE, which is designed 


animal. Little is understood “Sequences don't to use comparative genomics 
about how these regions work come with an index. to hunt for human regulatory 
and evolve. When the mouse We don't really know sequences, and modENCODE, 
genome sequence was com- fs which aims to do the same in 
pleted®, there were hopes that what they mean. fruitflies and nematodes, aspire 


human regulatory sequences 

could be fished out by looking for all the ‘con- 
served’ non-coding sequences — that is, those 
that have remained unchanged over thousands 
of years of evolution. “One of the big surprises 
in the genomics community was just how hard 
that was,’ says Andrew Clark, a geneticist at 
Cornell University in Ithaca, New York. 

The proteins that drive gene expression typ- 
ically bind to small sequence motifs that can 
be as short as ten nucleotides. Such motifs will 
appear by chance many times in the genome, 
making it difficult to sort out those that are 
conserved for a purpose, especially as some 
regulatory elements are located thousands of 
nucleotides away from the genes they regu- 
late. Regulation of gene expression can also 


Dew-loving all-stars 


The Drosophila species that have had their genomes sequenced differ quite a bit physically. Here is a small sample. 


Drosophila melanogaster 

Genome size: 117 million bases 
Chromosomes: 4 

Of interest because: This fly 

has redefined itself and genetics 
several times during the past 
century. Gene mapping was 
invented on the huge polytene 
chromosomes in its salivary 
glands. And D. melanogaster is the 
only Drosophila species that can 
be reliably manipulated by genetic 
engineering. 


D. grimshawi 

Genome size: 201 million bases 
Chromosomes: 6 

Of interest because: A giant 
among the drosophilids, this 
large fly's showy wings are useful 
for studies of development and 
mating behaviour. D. grimshawi 

is also used to study fruitfly 
evolution and population biology 
in its native home of Hawaii. The 
Hawaiian islands are host to about 
one-third of all Drosophila species. 


to improve these predictions. 

Sometimes a little shuffling can help pre- 
diction programs to home in on regulatory 
sequences. When researchers compared known 
regulatory elements directly upstream of genes 
in D. melanogaster with the same regions in 
D. pseudoobscura, they found that short motifs 
of 8-10 nucleotides had been conserved, but 
that the order of the motifs had been jumbled. 
“Motifs that had been previously character- 
ized were all scrambled,’ says Hartl, possibly 
because the tiny regions had been duplicated 
by chance elsewhere, leaving the original motif 
free to decay. Annotation programs now com- 
pare genomes and look for short sequences that 
have been conserved but shuffled as a hallmark 
of these regulatory elements. 


D. mojavensis 


Chromosomes: 4 


times a day. 
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Genome size: 193 million bases 


Of interest because: D. mojavensis 
survives the harsh environment 

of the Sonoran desert in the 
southwestern United States by 
drinking the juice of toxic cacti. 
Despite the dry habitat, males 

of the species lose 2-3% of their 
body weight every time they 
ejaculate. They mate several 


Other researchers hope to use the Drosophila 
genomes to identify puzzling cases in which the 
genetic regulation of a pathway has changed, 
yet the output of the pathway has remained the 
same. Researchers have found a few isolated 
cases of this ‘transcriptional rewiring’ in yeast 
and Drosophila, leading some to speculate that 
it might be a trend rather than a trivial chance 
occurrence. Comparative sequence data are cru- 
cial for uncovering these examples, says Alex- 
ander Johnson of the University of California, 
San Francisco. “Most ‘evo-devo’ studies would 
miss this type of circuit change,’ he says, because 
classical studies start with the overt differences 
between species and work backwards. 

Understanding genetic circuitry is precisely 
the kind of area that will benefit from Dro- 
sophila sequences, says Elgar, who has been 
studying transcriptional rewiring in vertebrates. 
Genomic studies require careful follow-up 
with wet lab experiments; D. melanogaster has 
played a starring role in experimental research 
for more than a century and is up to the job. “I 
don’t work on Drosophila but sometimes I wish 
I did? Elgar says. “Now everybody's going to 
want to join the fly community.” a 


1. Drosophila12 Genomes Consortium Nature 450, 203-218 
(2007). 

2. Stark, A. et al. Nature 450, 219-232 (2007). 

3. Nature 449, 10-11 (2007). 

4, Adams, M. D. et al. Science 287, 2185-2195 (2000). 

5. Mouse Genome Sequencing Consortium Nature 420, 
520-562 (2002). 


For more on the Drosophila genomes see articles 
starting on page 183. 


D. pseudoobscura 

Genome size: 156 million bases 
Chromosomes: 4 

Of interest because: The second 
fruitfly genome to be sequenced, 
D. pseudoobscura was a favourite 
of the geneticist Theodosius 
Dobzhansky. He studied evolution 
in natural populations of the fly in 
the 1930s, and looked at some of 
the chromosomal rearrangements 
now evident by comparing genome 
sequences. 
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Curry 

Japanese curry-lovers 

can now experience 
the taste of space, as pouches of 
curry identical to those eaten on 
the International Space Station 
have gone on public sale. 


Bacon 

The World Cancer 

Research Fund has 
branded the breakfast treat as 
one of the top food no-nos if you 
want to avoid cancer. 


Dolphin danger 

Conservationists want to stop 
children with disabilities such 

as autism from swimming 

with dolphins. Far from being 
therapeutic, they say the aquatic 
mammals’ play is a danger to kids. 


¢CAn airport with 50 
million passengers and 
countless take-offs and 
landings per day is not 
the place for alarge 
wild cat community.» 


Pasquale DiFulco of the New York 
port authority explains the city's 
decision to round up the estimated 
75 feral cats roaming JFK airport. 


people in India 
do not have access to toilets. 


(US$255 million) has been pledged 
by the Indian government to build 
toilets for the country’s poor. 


is the deadline the 
government has set for eradicating 
open-air defecation. 


Sources: Reuters, The Times, Mainichi 
Daily News 
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Poor follow-up hampers 
malaria projects 


The incidence of malaria in some African 
countries may soon approach that of the east- 
ern Mediterranean as a result of increased use 
of insecticide-treated mosquito bednets, spray- 
ing and more effective drugs. The first analyses 
of the effects of such interventions in the field 
indicate that they have had a direct and major 
effect on the malaria burden in Kenya and in 
Tanzania's Zanzibar archipelago. 

However, a lack of global coor- 
dination on eradication projects 
and poor data evaluation are 
jeopardizing malaria-control 
programmes in the worst- 
affected parts of the continent. Studies of the 
actual impact of control programmes on public 
health and mortality are surprisingly few, and as 
a result there is a worrying paucity of data. 

“The biggest flaw in current malaria-control 
efforts is that we need to invest more in disease 
surveillance systems to know the true story of 
what is really happening in Africa,” says Mark 
Grabowsky, Malaria Program Manager at the 
Global Fund to fight AIDS, Tuberculosis and 
Malaria. What's needed, says one prominent 
international health official who wishes to 
remain anonymous, is a greater international 
focus to put in place tools to compare data and 
standardize protocols. This has been achieved 
for diseases such as polio and measles, in which 
standardized data are available by district and 
month in Africa. There are no data of compara- 
ble quality for malaria, he says, adding that most 
reports of successes have been anecdotal. 

One study to evaluate the success of these pro- 
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grammes was carried out by Bob Snow’s group at 
the Kenya Medical Research Institute (KEMRI)- 
Wellcome Trust Collaborative Research Pro- 
gramme in Nairobi, and an international team. 
The researchers found that paediatric malaria 
admissions at hospitals on the Kenyan coast 
have fallen by up to 63% since 1999, as a result 
of interventions such as new drugs called artem- 
isinin-based combination therapies (ACTs)’. 
Snow’s group also studied about 
3,500 children in 72 rural areas 
of Kenya and found that bednet 
use was linked to a 44% reduc- 
tion in mortality’, 

Snow says he is convinced that the decreases 
in malaria in Kenya are a direct result of the rise 
in bednet coverage in the zone between 2004 
and 2006 from 7% to 67% of children, and the 
fact that 85% of rural clinics now stock ACTs 
that were non-existent only a few years ago. “I 
think we are going through an epidemiological 
transition because of scaling up of intervention 
coverage, says Snow, who believes that the epi- 
demiology of malaria in many African coun- 
tries is as a result shifting closer to that of the 
eastern Mediterranean region, where malaria 
incidence is ata more containable level of fewer 
than 10 in every 1,000, compared with 350 in 
every 1,000 Africans. 

Data from Zanzibar, published this week by 
Achuyt Bhattarai at the Karolinska Institute in 
Stockholm and his colleagues, also show that 
malaria deaths dropped to one-quarter of pre- 
vious levels between 2002 and 2005 after the 
introduction of ACTs and widened use of bed- 
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Not all children can harness 
the full goodness of their 
mother's milk. 
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nets’. Similarly encouraging preliminary data 
are coming in from Ethiopia, Eritrea, Mozam- 
bique, South Africa and Rwanda. 

But collecting and analysing rigorous data 
is difficult. Snow had to ensure that his data 
covered a long enough period, and that his 
models accounted for rainfall and other fac- 
tors that make teasing out the direct impact of 
interventions difficult. 

Until now, most of the data have emerged in 
a fragmented way from organizations with a 
vested interest in the figures released. A report 
that has been billed as a success story, released by 
the United Nations Children’s Fund (UNICEF) 
on 17 October, showed that annual global pro- 
duction of insecticide-treated bednets soared 
from 30 million nets in 2004 to 63 million nets 
in 2006, and orders of artemisinin jumped from 
3 million doses in 2003 to 100 million in 2006. 

But experts argue that such organizations 
often release data more for advocacy than 
to assess operations. The spin on the figures 
masks the fact that all countries are far short 
of the targets set by Rollback Malaria of 80% 
coverage of all interventions by 2010 — most 
have not achieved a fraction of that. 

One key test will be in Zambia, in a pro- 
gramme launched in 2005 and funded by the Bill 
& Melinda Gates Foundation. It aims to reduce 
deaths from malaria by 75% by 2008, through 
a huge scale-up of bednets, drugs and house 
spraying. Results submitted for publication 
show that in households with bednets, parasite 
prevalence in children and anaemia in infants fell 
from about 20% to 13%. At the end of 2006, the 
project had 20% of children under bednets; 40% 
of households owned a net; and spraying reached 
34% of targeted households. In absolute terms 
these are still “miserably low numbers’, points 
out one expert, adding that the overall malaria 
effort falls far short of its own targets. 

The few countries where progress on mor- 
tality is being reported all fall within zones 
in Africa that are moderate both in terms of 
the intensity of transmission (the number of 
infected mosquito bites per person) and the 
length of the malaria season. 

No one has yet shown a reduction of malaria 
in countries such as the Democratic Republic 
of Congo and Nigeria, where transmission rates 
are so high that preventative measures might 
have little impact on mortality. These two coun- 
tries alone account for around half of malaria 
mortality in Africa, but poor management and 
health systems mean that they remain laggards 
in implementing malaria control measures, let 
alone evaluating their impact. a 
Declan Butler 
1. Okiro, E. E. et al. Malaria J. (in the press). 
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Committee releases shortlist 
of Mars landing sites 


Six potential landing sites have been chosen for 
NASA's Mars Science Laboratory, a large rover 
set to assess the past habitability of sites on 
the planet's surface, when it lands in October 
2010. The shortlist, chosen from dozens of 
possibilities, includes craters partly filled with 
sediment, an ancient flood channel and regions 
rich in clay minerals thought to date from an 
era when the martian surface was wetter than 
it is today. However, changes 
to the mission's scope mean 
that options that might offer 
excellent science could end up 
being dismissed as impractical. 

The mission has a ‘landing 
ellipse’ roughly 20 kilometres 
across to account for the uncertainties 
involved in guiding a spacecraft over millions 
of kilometres to a soft landing on a windy 
planet. The terrain in the ellipse needs to 
be smooth and flat. “If you ask an engineer, 
they'd like to land in a Walmart parking lot," 
says Jack Mustard, a planetary geologist at 
Brown University in Providence, Rhode Island. 

As originally conceived, once landed, 
the rover would have been able to travel 
well outside this ellipse to places neither 
smooth nor flat — the sorts of outcrop that 
geologists favour. But participants at last 
month's workshop to choose the candidate 
sites, hosted by NASA's Jet Propulsion 
Laboratory in Pasadena, California, found that 
sites where the rover would need to travel 
10 kilometres or more to obtain samples were 
now being flagged as possibly problematic. 

“| would have been screaming at that,” 


“If you ask an 
engineer, they'd like 
to land ina Walmart 

parking lot.” 


says Ken Edgett, of Malin Space Science 
Systems of San Diego, who is a principal 
investigator on the mission but was unable to 
attend the workshop because of the wildfires 
in California. “It limits your expectations,” 
says Mustard, who favours a site in the Nili 
Fossae region from which the rover would 

be able, if all went well, to sally forth toa 
region of dramatic erosion that he has dubbed 
Monument Valley. 

Changes in the way that 
the rover’s moving parts will 
be lubricated raise issues for 
sites in the planet's southern 
highlands, as they reduce the 
rover's capabilities in winter 
conditions. Nevertheless, the scientists 
shortlisted two southern sites as worthy of 
further study. One of them, Holden Crater, 
contains what seem to be lake sediments and 
a delta. “It's awesome,” says Mustard. 

The shortlisted sites will now be 
scrutinized further by instruments on board 
NASA's Mars Reconnaissance Orbiter, which 
is currently circling the planet. At the same 
time, computer models will assess the risk 
of winds at the sites being strong enough 
to mess up the landing. It is very unlikely 
that all six will be considered too risky, says 
John Grant of the Smithsonian Institution's 
Center for Earth and Planetary Studies in 
Washington DC and co-chair of the site- 
selection committee. The final decision does 
not need to be made until nearer the launch in 
October 2009. |] 
Oliver Morton 
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Mars projection with shortlisted landing sites, two of which are in the southern hemisphere. 
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Mysterious force could be 
an ‘artefact’ of a voidin 
space. 
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Peering into the heart of a black hole 


Quantum mechanics might be capable of strip- 
ping bare a black hole to reveal the mysterious 
and unseeable ‘singularity’ that exists at its 
heart', say George Matsas and André da Silva 
of the Sao Paulo State University in Brazil. 

It has long been suspected that these sin- 
gularities — where the known laws of physics 
break down — are always decorously veiled 
behind the ‘event horizon, a boundary beyond 
which light cannot escape from the fearsome 
gravitational pull of a black hole. Theoreti- 
cally, nothing within an event horizon can 
ever be perceived or investigated by an out- 
side observer, because no light can escape. So 
the singularities remain insulated from the 
rest of the Universe. 

This amounts to what in 1969 physicist 
Roger Penrose called ‘cosmic censorship, 
whereby the laws of physics conspire to save 
us from having to gaze on the unthinkable. 
According to Einstein's general theory of rela- 
tivity, in the middle of a black hole, its mass 
collapses in on itself to form an infinitely small, 
infinitely dense point, where space-time itself 
is punctured. Even causality — the relation of 
a cause and its effect — breaks down, which 
seems to defy not only physics but logic. “Pen- 
rose’s motivation seemed to be to preserve the 
decorum of physics,” Matsas says. 

But physicists have wondered whether 
event horizons are ever stripped away, leaving 
these absurdist singularities naked. One pos- 
sibility, for example, is that the event horizon 
might vanish if a black hole spins very fast. 
Light and matter might then be flung out by 
centrifugal force. 

In September, physicists Arlie Petters of 
Duke University in Durham, North Carolina, 
and Marcus Werner of the University of Cam- 
bridge, UK, proposed that sin- 
gularities stripped naked by fast 


“It is widely believed 


a 


No light escapes the event horizon, so the infinitely dense singularity remains hidden. 


transfer enough angular momentum to the 
hole to overspin it into nakedness, a particle 
would have to approach the event horizon at 
such a high speed and such a glancing angle 
(similar to stroking the side of a spinning-top) 
that it wouldn't get sucked inside the event 
horizon in the first place. 

No way of creating a naked singularity has 
yet been discovered within the framework of 
the classical physics described by the theory of 
relativity. But in a paper published last week, 
Matsas and da Silva propose that quantum 
mechanics, which normally applies only to very 
small objects, could subvert cos- 
mic censorship. An electrically 


rotation should be detectable by that quantum charged black hole spinning fast 
astronomers because they act as scuwill il enough to be right on the brink 
very strong ‘gravitational lenses, aay ee of losing its event horizon might 
bending the light coming from the structureofthe be pushed over the edge by gain- 


stars behind them by their dis- 
tortion of space-time. Petters 
and Werner say that existing telescopes should 
have sufficient spatial resolution to spot naked 
singularities in the centre of our own Galaxy’. 

But how could a black hole spin fast enough 
to bare its heart? It was shown in the 1970s 
that a black hole’s spin cannot be increased by 
swallowing rotating objects, because the gain 
in angular momentum — the momentum 
caused by rotation — is generally balanced 
by the slowing influence of the extra mass. To 


singularities.” 


ing angular momentum in a 
non-classical way, they conclude. 
Quantum particles have the strange property 
of being able to ‘tunnel through barriers that, 
according to classical mechanics, they have 
insufficient energy to pass over or through. 

So whereas classical particles with enough 
angular momentum to overspin a black hole 
cant get inside the event horizon for the rea- 
sons explained in the 1970s, Matsas and da 
Silva’s study finds that quantum particles 
could tunnel inside, and send the black hole 
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over the brink into nakedness. “It’s a neat idea,” 
says Petters. “One issue is whether such black 
holes exist, or if they will remain charged long 
enough for us to catch one, since charged black 
holes tend to become neutralized” 

Matsas himself cautions that the work does 
not necessarily mean that cosmic censorship is 
violated in reality, because quantum theory is 
known to be incomplete. The general theory of 
relativity that explains gravity and predicts black 
holes, and the theory of quantum mechanics, 
are known to be fundamentally incompatible, 
and physicists hope that they might ultimately 
be reconciled in a quantum theory of gravity. 

Whether such a theory would rescue cosmic 
censorship, says Matsas, remains to be seen. But 
he says that “we don’t see any compelling reason 
to preclude the existence of naked singularities 
in the context of quantum gravity”. Such an 
improved theory should in fact help make sense 
of what naked singularities are like in the first 
place. “It is widely believed that quantum grav- 
ity will unveil the structure of the singularities,” 
says Matsas, adding that they will then probably 
seem “quite benign to physics” rather than the 
monstrosities they now seem to be. a 
Philip Ball 
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NEWS IN BRIEF 


Congress to vote on open 
access and NIH funds 


US investigators funded by the National 
Institutes of Health (NIH) may soon be 
compelled to publish only in journals that 
make their research papers freely available 
within one year of publication. 

Congress is this week expected to take 
final votes on a bill incorporating this 
directive. The measure is contained ina 
spending bill that boosts the biomedical 
agency’s effective budget by 3.1%, to 
$29.8 billion in 2008. 

President George W. Bush has vowed 
to veto the bill, which will fund the 
Department of Health and Human Services 
and other agencies, because it includes what 
he calls “irresponsible and excessive” levels 
of spending. 

But congressional Democrats have 
attached to the measure an unrelated 
but politically popular bill funding the 
Department of Veterans Affairs. They 
hope that this will generate the two-thirds 
support needed in both houses of Congress 
to override a presidential veto. 

The open-access requirement in the bill 
would apply only during fiscal year 2008; 
it would need to be renewed in yearly 
spending bills in the future. 


Argo system makes a 
splash with final float 


A global network of floats gauging the vital 
signs of the world’s oceans was completed last 
week, with the launch of the 3,000th device. 
Buoy-like floats in the Argo project 
periodically dive to depths of 2,000 metres, 
where they drift for 10 days recording 
temperatures, salinity and current velocity, 


| 
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Floating vote: crew onboard the Kaharoa deploy 
the 3,000th device in the Argo network. 


and then surface — sending the data to 

a satellite for transmission to a central 
repository (see Nature 415, 954-955; 2002). 
More than 30 nations in the Argo system will 
use the data to create ocean profiles, which 


then will be monitored for changes over time. 


Eight years after deployments began, the 
New Zealand research vessel Kaharoa on 
1 November dropped what were designated 
as the final floats at latitude 45° south in the 
southern Pacific Ocean. 


Biomedical agency puts 
epigenetics on the map 


The US National Institutes of Health (NIH) 
is set to roll out the latest highway on its 
‘roadmap for medical research (see Nature 
448, 406-407; 2007). It is seeking project 
proposals worth $191 million in epigenetics. 

The agency already spends about $240 
million per year on epigenetics, the study of 
stable, inherited genetic modifications that 
affect gene expression and function without 
altering the DNA sequence. 

Several projects will be funded in the 
push. These include the development 
of ‘reference epigenomic maps; studies 


Factory delay leaves flamingos in the pink 


The lesser flamingos 
(Phoenicopterus minor) of 
Tanzania's Lake Natron 
(pictured) may get a temporary 
reprieve from a US$400- 
million soda-ash plant that 
was to have been built nearby. 
An environmental advisory 
committee has recommended 
the government block the 
factory's construction unless 
its Indian-Tanzanian developer 
provides more details of plans 
to protect the local ecosystem. 


Environmentalists are up in arms over the factory because the lake is a major breeding ground 
for East Africa's roughly 2 million lesser flamingos and also home to a number of rare species, says 
Lota Melamari, chief executive of the Wildlife Conservation Society of Tanzania. 

Melamari, who served on the advisory panel, says developers presented few details about how 
the plant would affect the lake ecosystem. “The main concern was a lack of information,” he says. 
The government is now deciding how to handle the proposal. 
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Congress to vote on open 
access and NIH funds 


US investigators funded by the National 
Institutes of Health (NIH) may soon be 
compelled to publish only in journals that 
make their research papers freely available 
within one year of publication. 

Congress is this week expected to take 
final votes on a bill incorporating this 
directive. The measure is contained ina 
spending bill that boosts the biomedical 
agency’s effective budget by 3.1%, to 
$29.8 billion in 2008. 

President George W. Bush has vowed 
to veto the bill, which will fund the 
Department of Health and Human Services 
and other agencies, because it includes what 
he calls “irresponsible and excessive” levels 
of spending. 

But congressional Democrats have 
attached to the measure an unrelated 
but politically popular bill funding the 
Department of Veterans Affairs. They 
hope that this will generate the two-thirds 
support needed in both houses of Congress 
to override a presidential veto. 

The open-access requirement in the bill 
would apply only during fiscal year 2008; 
it would need to be renewed in yearly 
spending bills in the future. 
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of epigenetic contributions to ageing, 
development and disease, and responses 
to environmental exposures; the discovery 
of new epigenetic targets; and the 
development of technology, data analysis 
and computational infrastructure. The 
deadline for proposals is March, and five- 
year funding will begin next autumn. 


San Francisco gets a green 
natural history museum 


Last week, the California Academy of 
Sciences received the keys to its new 
environmentally friendly headquarters. The 
building sits on the site of its historic home, 
which was damaged during the 1989 Loma 
Prieta earthquake. Nestled in scenic Golden 
Gate Park, in environmentally conscious 
San Francisco, it is in an ideal location for an 
ecologically inspired museum. 

The new $484-million building was 
designed by architect Renzo Piano and 
incorporates so many green design features, 


fa 
Going green: a model of the California Academy 
of Sciences’ new home. 


including a green roof (pictured above) and 
insulation made from recycled blue jeans, 
that it beats the energy-use standards set 

by the US Department of Energy by 30%. 
Even the steel and rubble from the old 
headquarters were recycled to make other 
buildings and new roads. 

It is also expected to be the first museum 
to earn the highest stamp of approval from 
the US Green Building Council’s Leadership 
in Energy and Environmental Design Green 
Building Rating System — a nationally 
accepted set of benchmarks for green design. 


Partnership paves way for 
global carbon market 


A coalition of countries, US states and 
Canadian provinces formed a partnership 
last week to promote the establishment of a 
global carbon-trading market. 

Officials billed the International Carbon 
Action Partnership as a central repository 
for sharing information among various 


nations and coalitions that are adopting 
market-based regulations for greenhouse 
gases. The goal is to align the development 
of independent markets so that they can 
serve as the foundation for an integrated 
global market. 

The European Union has a functioning 
carbon-trading market under the Kyoto 
Protocol, and the Chicago Climate 
Exchange, a smaller market based on 
voluntary emissions reductions, is 
operational in the United States. 

The new coalition includes nine members 
of the European Union, the European 
Commission, ten US states and two Canadian 
provinces that are organizing two regional 
greenhouse-gas markets. New Zealand and 
Norway are also founding members. 


White males maintain pole 
positions in US science 


If you are studying science in the United 
States, the chances are that your mentor 

is a white male. And although more 
underrepresented minorities and women are 
earning degrees, fields such as chemistry and 
mathematics are among the worst in helping 
them make the leap to faculty positions, 
according to a report led by Donna Nelson, 

a chemist at the University of Oklahoma in 
Norman (see http://tinyurl.com/yqwjyq). 

The department-by-department 
breakdown of the 100 top-spending science 
and engineering departments shows that 
some fields are more inclusive than others. 
In sociology, one of the best disciplines 
at training minorities, the percentage of 
blacks, Hispanics and Native Americans 
earning PhDs equals the percentage of 
assistant professors from those groups in the 
top 50 departments surveyed. In chemistry, 
by contrast, minorities earn 8.5% of PhDs 
— but just 3.7% ofall professorships and 
4.7% of assistant professorships. 

For women, children and lower self- 
confidence may help to explain the gap, 
suggests a separate survey by the US 
National Institutes of Health of more than 
1,300 postdocs (E. D. Martinez et al. EMBO 
Rep. 8, 977-981; 2007). Women are more 
likely than men to sacrifice their careers 
for kids, the survey found, and 60% of 
males versus 40% of females felt confident 
that they would find a faculty job after 
completing their postdoc. 


Correction 

The News Feature ‘Space invaders’ (Nature 448, 
746-748; 2007) stated that Omar Yaghi was 

the first to design a metal-organic framework 
(MOF) in 1998. But in the early 1990s, before 
the term MOF was coined, a similarly open 
three-dimensional polymeric network structure 
linking organic ligands and metal centres had 
been reported (B. F. Hoskins and R. Robson J. Am. 
Chem. Soc. 112, 1546-1554; 1990). 
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Carbon tax lite 


The impact of a British tax on greenhouse-gas emissions has faded over time, as Geoff Brumfiel reports. 


ight years ago Gordon Brown, then 
Britain’s chancellor of the exchequer, 
announced a wide-ranging ‘climate levy’ 
on industrial carbon emissions. He pledged that 
the levy would benefit the environment and 
encourage investment in cleaner technologies. 

But a recent audit — one of the most far- 
reaching of its type in the world — suggests 
that the scheme’s bark was worse than its bite. 
The widely respected National Audit Office 
(NAO) reported in August that nearly 90% of 
its estimated impact occurred even before the 
levy started in 2001, as firms reacted to the idea. 
Subsequent reductions that could be attributed 
to the actual tax itself were small. 

“Tt appears that the announcement had more 
of an impact than the taxes,” says 
Tim Yeo, a Conservative MP and 
chair of the House of Commons 
Environmental Audit Commit- 
tee, which has an investigation of 
its own under way. The NAO esti- 
mated that the levy cut Britain’s 
total annual industrial emissions 
of about 60 million tonnes of car- 
bon by 3.1 million tonnes between 
1999 and 2001, but by only 400,000 
tonnes in subsequent years. 

“T don't think the levy is doing very much,” 
says Terry Barker, an economist at the Uni- 
versity of Cambridge and chairman of the 
consultancy firm Cambridge Econometrics, 
which has carried out several analyses of the 
tax’s impact for the Department of Food, Agri- 
culture and Rural Affairs (DEFRA), which is 
responsible for its implementation. 


Soft targets 
As originally conceived, the climate levy 
would simply have taxed industrial users on 
their overall energy consumption. But the UK 
treasury, sensitive to charges from the Con- 
federation of British Industry (CBI) and other 
lobby groups that such a tax would damage 
the competitiveness of UK firms, agreed to a 
more complex implementation that seems to 
have reduced its effectiveness. 
Energy-intensive sectors such as brewing 
and bulk chemicals can win an 80% rebate on 
the tax if the sector as a whole is managing to 
meet emissions targets that they negotiate with 
DEFRA. For example, a 2001 agreement guar- 
anteed brewers the rebate if they managed to cut 
energy use on every pint of beer produced by 9% 
over seven years — which they then did. 


Gordon Brown's climate levy has had a modest impact on industry's carbon emissions. 


According to Barker, DEFRA 
officials negotiating these targets 
lacked detailed knowledge of the 
sectors in question and were usu- 
ally out-foxed by industry, agree- 
ing to targets that the sectors 
could meet without much effort. 

Matthew Farrow, head of environmental pol- 
icy at the CBI, disagrees with this assessment, 
saying that both sides lacked information on 
how easily emissions cuts could be achieved. 
“When the agreements were first negotiated in 
many sectors, there wasnt much data,” he says. 

The NAO reports that the 


figure at 1.9 million tonnes. By contrast, it is 
estimated that rising energy prices and other 
factors will deliver a total of 2.4 million tonnes 
of annual reductions by 2010. 

According to the NAO, the impact of the levy 
on sectors excluded from agreements, such as 
retail and banking, is hard to gauge as they are 
not required to report their emissions. 

Environmentalists say that even the full levy 
of 0.44 pence on a kilowatt-hour of electric- 
ity — less than 10% of the cost of the energy 
— is too small to make a difference. “The fact 
is that it didn't significantly hit the bottom line,” 

says Mike Childs, head of cam- 


resulting agreements were “It appears that the paigns at environmental group 
usually modest in scope. They announcement had Friends of the Earth in London. 
typically required sectors to more of an impact Nevertheless, many believe the 
cut their emissions per unit of levy has had a positive impact. 
production — not of totalemis- than the taxes.” It forced many sectors to look 
sions. This meant that industries —Tim Yeo at their consumption, says Paul 


could increase overall emissions 

if production increased, and not lose their 
rebate. And when sector targets were renegoti- 
ated in 2004, ten sectors out of 45 renegotiated 
agreements with DEFRA that exempted them 
from 80% of the levy as long as they met a 2010 
target that turned out to be actually above their 
reported 2004 emissions, when those figures 
became available. 

As aresult, the agreements part of the levy 
has fallen short of its goals, according to the 
NAO. It was supposed to deliver a 2.9-million- 
tonne annual reduction in carbon emissions 
by 2010, but a revised estimate now puts this 
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Ekins, head of the environmental 
group at the Policy Studies Institute, a London- 
based think-tank. “Boards will have demanded 
reports on energy use, probably for the first 
time in their lives,” he says. And despite being 
critical of the tax, Childs and others still see the 
levy as a useful starting point for Britain — and 
an example for other countries. 

The UK government has announced that it 
will extend both the levy and the sector agree- 
ments until 2017. But Yeo says that his commit- 
tee, when it reports next month, may advocate 
the renegotiation of these agreements to give 
the climate levy some much-needed teeth. @ 
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Not all species can be saved from extinction. Emma Marris talks to 
conservation biologists about prioritization and triage. 


ichard Cowling was playing with 
maps of South Africa on a computer 
screen when he had his epiphany. He 
was designing a conservation plan for 
the Cape Floristic Region, or fynbos, an arid 
landscape of shrubs and flowers that contains 
some 9,000 species, many unique to the area. 
Some of these, such as the mandala-like sunset 
blooms of the protea flowers, are spectacular. 
Some — like the geometric tortoises, whose 
fetching shells help them hide from baboons 
and secretary birds — are seriously endan- 
gered. Cowling, a conservation biologist at 
Nelson Mandela Metropolitan University in 
Port Elizabeth, was working on defining a set 
of reserves that would maximize the chances 
of conserving all those species. The project 
was so large that it would end up as a series of 
16 papers by 36 authors that occupied all 297 
pages of Biological Conservation’s July-August 
2003 issue. And it was also, Cowling realized as 
he stared at the screen, “sheer nonsense”. 
“Thad to click ona couple of grid squares and 
the project would be complete,” Cowling says. 
“And it dawned on me: complete for whom? 
There was no way that this reserve would ever 
happen. It had to be linked to some social reali- 
ties on the ground” 
In the preface to his 1981 book, Extinction’, 
Paul Ehrlich, a biologist at Stanford University 
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in California, provided a powerful parable for 
conservation biology: the story of the rivet 
popper. A passenger inspecting the plane he 
is about to fly in notices someone popping 
rivets out of the wings. When challenged, the 
rivet popper says that the passenger shouldn't 
worry because not all the rivets are necessary. 
For Ehrlich the rivets represent species and the 
rivet popper represents humanity, indifferent 
to the looming danger of ecosystem collapse 
and the end of the natural processes that sup- 
ply raw materials of life such as clean water, 
wild food, carbon sequestration and climate 
regulation. In the apocalyptic style for which 
he has become famous, Ehrlich predicted that 
continuing to pop the rivets of ecosystems 
would lead to “a crumbling of post-industrial 
society”. He demanded that the rivet popping 
be stopped. 

There aren't many, if any, conservation 
biologists who would disagree with that con- 
clusion. In principle. The problem is that they 
don't have the resources to back up such ambi- 
tion in practice. Spending on conservation by 
major international and non-governmental 
organizations has been estimated at around 
US$2 billion a year”. Given constrained 
resources, the biologists have to set priorities. 
“Triage’ is a dirty word in some conservation 
circles, but like many dirty words, it describes 
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something common. Whether they admit it or 
not, conservationists have long had to make 
decisions about what to save. 

As more and more admit it, open discussion 
about how the decisions are best made — by 
concentrating on particular species, or particu- 
lar places, or absolute costs, or any other crite- 
rion — becomes possible. Whichever criteria 
come into play, one thing remains constant. 
The decisions have to be made quickly. In the 
bloody business of conservation biology, the 
longer you pause to reorder your list, the more 
species will become extinct. 


Superfluous species 

Perhaps the most controversial basis for triage 
is redundancy — prioritizing those species that 
provide a unique and necessary function to the 
ecosystem they live in and letting go of those 
that are functionally redundant. It might seem 
sensible to lose a few rivets around the plane’s 
over-engineered windows if that saves the riv- 
ets actually holding the wings to the fuselage. 
This idea was raised in the early 1990s by Brian 
Walker of the Australian Commonwealth 
Scientific and Industrial Research Organiza- 
tion’. “Regrettable as it might be,” he wrote, 
“it is most likely that global biodiversity con- 
cerns will ultimately reduce to a cost-benefit 
analysis. Without knowledge of redundancy, 
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or more broadly, the relationship between 
the levels of biodiversity and ecosystem func- 
tion, we cannot estimate either the costs 
or the benefits.” 

The majority opinion among conservation 
biologists today is that they still understand too 
little about ecosystem functions to say for sure 
which species are the ‘load-bearing’ ones whose 
presence keeps a complex, multi-tiered ecosys- 
tem from collapsing into some worst 
case dull scenario of rats, roaches 
and invasive grass. “We are so 
fundamentally ignorant,” 
says Norman Myers, a fellow 
of the University of Oxford, 
UK, and adjunct professor 
at Duke University in Dur- 
ham, North Carolina. “We 
cannot afford, by a long, long 
way, to say which species are 
dispensable.” Andrew Balmford, 
a conservation biologist at the Uni- 
versity of Cambridge, UK, tends to agree: 
spotting key species is “an interesting exercise 
intellectually ... but by the time we've figured it 
out the forest will have gone anyway”. 


Save the genes 

Not everyone is quite so convinced the prob- 
lem is ineluctable. “I think there are a lot of 
systems where we know more than we think,” 
says Reed Noss, a conservation biologist at 
the University of Central Florida in Orlando. 
“Tf you can get naturalists to open up and talk 
about what they know, we can at least gen- 
erate some testable hypothesis and do some 
manipulation if we have time.” Kent Redford, 
head scientist at the New York-based Wildlife 
Conservation Society, agrees, up to a point. 
“Our big problem is that we have been raised 
to believe that unless you have complete infor- 
mation you cannot make recommendations, 
and I think that is something we are going to 
be put on trial for by our children. It’s baloney:” 
But his belief that science might make this 
sort of prioritization possible doesn’t mean 
he approves of it. “I don't care if something is 
redundant,” he says, “I want to save it for all 
these other reasons.” 

Perhaps aware of the resistance that func- 
tional prioritization might encounter, Walker's 
forthright paper suggested a complementary 
approach: taxonomic distinctiveness’. This 
turns out to be less contentious; although 
there are no organizations dedicated to sorting 
the load-bearing species from the non-load- 
bearing, there is at least one that dedicates its 
resources to saving the mammals that are phy- 
logenetically distinct. The EDGE programme 
— its intials stand for evolutionarily distinct 
and globally endangered — of the Zoological 


“| don't care if 
something is 
redundant. | want to 
save it for all these 
other reasons.” 
— Kent Redford 


Society of London argues for giving priority to 
endangered species of mammals that are far 
out on their own on the tree of life, without 
close relatives. 

The EDGE scheme gives each species a 
score derived from its position on a phyloge- 
netic tree. A lone species out on a long branch 
gets a higher score because it is the sole bearer 
of genes that represent a very long period of 
evolution. Take the three-toed sloths, 
which parted company with the 
rest of the sloths some 15 mil- 
lion years ago. “There are two 

species of three-toed sloth 
that only diverged 1 mil- 
lion years ago. If one went 
extinct, we would lose 1 mil- 
lion years, but if we lose both, 
we lose 15 million years,” says 
Nick Isaac, a research fellow at 
the Zoological Society who helps 
to run the EDGE programme’. 
“You could make an analogy with art,’ 
says Isaac. “You are in a spaceship leaving Earth 
with three paintings. Do you take three Rem- 
brandts, or do you take one Rembrandt, one 
Leonardo and one Picasso?” The group’s top 
five targets for funding — which at this point 
amounts to paying for a student in the countries 
where the animals live to study their conserva- 
tion — are the Yangtze River dolphin (Lipotes 
vexillifer), the long-beaked echidna (Zaglossus 
bruijni) of New Guinea, the riverine rabbit 
(Bunolagus monticularis) of the Karoo desert in 
South Africa, the Cuban solenodon (Solenodon 


cubanus) and its cousin, the Hispaniolan 
solenodon (Solenodon paradoxus). Similar 
to each other, but distinct from anything else, 
the solenodons merit two slots. Conservation 
favourites such as tigers, pandas and gorillas 
are noticeably absent from the list. 

There are variations on this theme float- 
ing about. Redford suggests that when a spe- 
cies is identified as endangered, a priority 
list of populations within the species should 
be drawn up based on genetic diversity. And 
a biologist who considers his idea a little too 
hot to put his name to suggests putting species 
that have future evolutionary potential at the 
top of the list. This means prioritizing current 
species according to their capacity for future 
speciation. Big, long-lived species face inher- 
ent disadvantages under this idea: such a list 
would have little room for elephants or whales. 
Or redwoods. 


Battle of the maps 

A much more popular alternative to prioritiz- 
ing species is prioritizing areas. There is less 
need to know how the ecosystem works — just 
identify an area of interest and try to preserve 
it in its entirety. 

The first such scheme to gain real influence 
was Myers hotspot map, which has been pub- 
lished in several incarnations since its inception” 
in 1988. The original version, which prioritized 
tropical forests above all other places, was per- 
suasive enough for Conservation International, 
headquartered in Arlington, Virginia, and the 
MacArthur Foundation, based in Chicago, 


Two species of three-toed sloth — 15 million years of evolution. 
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Illinois, to adopt it as a framework for their 
efforts. But like all prioritizing, it had its crit- 
ics: “I was told it was immoral, that all species 
are equal,’ Myers recalls. 

The criteria he has used to define the 
hotspots are, Myers freely admits, somewhat 
arbitrary, and have evolved over time. In the 
2000 version an area makes the grade if it 
contains at least 0.5%, or 1,500, of the world’s 
300,000 plant species as endemics — that is, 
species that are seen nowhere else — and has 
lost 70% or more of its primary veg- 
etation®. In this iteration the Bra- 
zilian cerrado, the fynbos and 
other mixed grasslands joined 
the forests. 

Myers’ hotspot map set a 
trend: it is now practically com- 
pulsory for every conservation 
organization to have its own prior- 
ity map. The Cape Floristic Region 
received its journal-filling loving-care 
from Cowling and his peers in part because 
it had made it onto so many of these priori- 
tization lists. As well as being an accredited 
hotspot under Myers’s scheme it had also made 
itinto conservation group WWF's ‘Global 200° 
scheme. Birds found nowhere else, such as the 
protea canary and the orange-breasted sunbird, 
had propelled the area onto Birdlife Interna- 
tional’s Endemic Bird Areas list’. 


Priority actions 
The fynbos demonstrates the extent to which 
maps will agree about things, which raises the 
question of why there should be so many. “It 
has been a not terribly profitable exercise over 
the last ten years to have such a proliferation 
of schemes that are basically very simi- 
lar,” says Georgina Mace, who runs the 
Centre for Population Biology at Impe- 
rial College in London, UK. “They act 
as sort of branding for the organizations. 
It still surprises me that the big conserva- 
tion organizations have not gotten together 
under a single banner, like Make Poverty 
History.’ 

At the same time, partisans can detect — 
and defend, debate and disparage — vari- 
ous differences in approach. “We have 
been arguing, or certainly jockeying, 
to present one piece of science as ya 
more legitimate or stronger than 
another,” says Jon Hoekstra, a 
senior scientist at the Nature / 
Conservancy in Seattle, 
Washington. These squab- 
bles are framed to sug- 
gest that there is one right 
answer — one most valid 
way to prioritize areas. 
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“| was told it was 
immoral, that all 
species are equal.” 
— Norman Myers 


But different starting assumptions and differ- 
ent goals mean that many of the schemes are 
not directly comparable. “We have to remem- 
ber that they reflect the philosophical decisions 
made at the beginning,” says Hoekstra. 

The approach that currently enjoys perhaps 
the highest level of acclaim, at least scientifi- 
cally, is that taken by Hugh Possingham of the 
University of Queensland, in Brisbane, Aus- 
tralia. His one goal is maximizing number of 

species conserved, and he loathes scoring 
systems. Instead he uses algorithms 
that measure real-world costs 
against benefits in terms of spe- 

cies number, and the resulting 

papers, colleagues say, are ina 

league of their own’. 

In his latest work he com- 

pares different actions in differ- 
ent places with each other, which 
is more complex than one might 
think. Land prices vary around the 
world, as does species richness. Many invest- 
ments have diminishing returns over time: 
once a large chunk of one ecosystem is pro- 
tected, turning a bit more into a park won't save 
many additional species. On the other hand, 
some interventions begin to pay off seriously 
only after a certain investment threshold is 
reached. “If you were trying to get all the rats 
off an island, unless you invest enough to get 
them all off, you might as well not even bother,” 
explains Possingham. On top of all this is the 
problem that data on costs are infamously 
scanty — so much so that many ear- 
lier analyses just used land area as a 
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proxy, an astonishing simplification. 

“In a sense, it is just about good problem 
definition,” says Possingham. “If you don't do 
that right, you head down these scoring paths. 
The people who make them just have a feeling 
of which facts are important, and they throw 
them in.” Possingham tries to be as rigorous 
as possible, and sometimes that means not 
everything gets saved. “A lot of people get upset 
with that. It basically says some regions aren't 
working at all. They are too expensive, the 
threats are too huge, or there are not enough 
species in them.” 


Mount Lofty's short straw 

Consider, for example, the Mount Lofty wood- 
lands of Australia, where eucalyptus trees shel- 
ter rare orchids (pictured), spiny echidnas and 
cockatoos. Surely it is worth preserving them 
from the invasive predators such as foxes and 
cats that threaten them? But in a trade-off 
between spending on the Mount Lofty ranges 
and on the montane regions of the fynbos, 
Possingham’s algorithms give the money to 
the fynbos — among other regional invest- 
ments. The Australian woodlands get nothing, 
despite the fact the fact that Possingham, an 
avid birder, would bitterly regret losing part 
of the original range of the endangered regent 
honeyeater (Xanthomyza phrygia); he’s par- 
ticularly keen on honeyeaters. 

Putting this sort of insight into practice is 
not simple. Most Australian money isn’t trans- 
ferable to South Africa, any more than money 
given to preserve pandas can be spent on solen- 
odons. But some of those who administer the 
sliver that is fungible — people at the World 
Bank, the Global Environmental Facility and 
other large foundations — are taking an inter- 
est in Possingham’s approach. Peter Kareiva, 
the head scientist of the Nature Conservancy, is 
one of many researchers who has 
been co-authoring papers with Pos- 
singham on such return-on-investment 

models of conservation”; a few years ago, 

as it happens, he rubbished the whole idea of 
hotspots in American Scientist'®. 

Balmford, too, is excited about these 
approaches. “Possingham’s new techniques 
on setting priorities dynamically, allowing 

you to shift from one to another, are really 
exciting,” he says. The difficulty is getting 
them adopted by managers and decision- 
makers on the ground. “We have got to get 
away from conservation scientists handing 
down ideas from on-high to practitioners 
and expecting them to be received gratefully. 
It has got to be through examples, and from 
realizing from their peers that those things 
make sense.” 
This is the dreaded implementation gap, in 
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which theory ignores practice and practice 
ignores theory. In the end, it may not matter 
which prioritization scheme is most scientifi- 
cally defensible. What matters is that the people 
carrying out a scheme feel that it makes sense 
and will save species. On this pragmatic basis, 
many schemes shouldn't even be considered 
for implementation, says Hoekstra — includ- 
ing some of his own work. “I wrote this crisis 
eco-regions paper. It gives some real interest- 
ing perspective on the world. It highlights the 
crisis in temperate grasslands. But I don't think 
it is as useful to look at the map I generated to 
decide where to work; you could end up trying 
to restore something that is lost?” 


Armchair scientists 

“So much of this stuff is done by well-meaning 
people sitting as it were in their armchairs,” 
says Stuart Pimm, a conservationist at Duke 
University. Pimm recently eschewed the pri- 
ority list for his own expertise and invested in 
some land in the Amazon he knew was ripe for 
conservation. “You have to do what you think 
you can do. It is going to be based on imper- 
fect information and it is going to be very, very 
strongly conditioned by local politics and eco- 
nomics and social conditions,” he says. 

Pimm aside, the armchair approach can seem 
deeply entrenched. Redford points out the per- 
ennial problem of papers that follow pages of 
science with a cursory command, “that deadly 
last paragraph that begins ‘managers should”. 
For Noss, one solution is educating those man- 
agers. “We needa system that can provide mid- 
career training to people who are going 
to be working in land-management 
agencies, ocean-management 
institutions, and in environ- 
mental consultancies. Other- 
wise they are going to keep 
using these more outdated 
and less defensible approaches | 
to prioritization” 

For an on-the-ground 
conservationist, such as Stuart 
Cowell, project coordinator with 
Bush Heritage Australia, the many 
different schemes have been influential, but 
not immediately applicable. “We haven't taken 
those approaches off the shelf? he says. Bush 
Heritage buys land with conservation value, 
but unlike the ideal maps on paper, some land 
is never going to come up for sale. What Cow- 
ell and his colleagues are asking themselves, 
he says, is: “Is there a benefit to an organiza- 
tion spending the time and resources in doing 
this sort of prioritization, which looks good in 
theory but perhaps does not take us as far as 
just some good expert knowledge?” 

There are some small successes. Possingham 


“We cannot afford, 
by along, long way, to 
say which species are 

dispensible 

— Andrew Balmford 


has had some luck impressing government 
bureaucrats with the rigour of his analyses; 
some spending decisions in Australia have been 
made on the back ofhis work. And South Africa 
has had real success in bridging the implemen- 
tation gap. “The US and European style is that 
the scientists write it and hope someone picks it 
up, but the South Africans are trying to get the 
people who are going to implement it to help 
with the priorities,’ says Redford. 


The messy reality 
Since his conversion experience over the 
digital maps of the fynbos, Cowling 
has been one of those attempting 
to build the input of decision- 
_ makers and local people into 
his schemes from day one. 
“The plans [I’ve worked on] 
| were done not because they 
" appealed to anyone’s curios- 
ity in an academic sense but 
because they were needed; he 
explains. He’s more interested 
in determining the possible than 
mapping the ideal. “Through the 
process of negotiation [with stakehold- 
ers] you end up with a series of projects, and 
funding is sought.” And sometimes that which 
is sought is actually found. 

Cowling says that getting all conservation 
biologists to do their prioritization work with 
both feet on the ground “will require a sub- 
stantial change in how researchers operate”. 
“Getting involved in the slushy stuff takes time. 
The kind of research is not likely to appear in 
the pages of high impact journals. You might 
get it into the pages of Ecology and Society, he 
says. But his work is not going unrecognized, 


©2007 Nature Publishing Group 


Preserving the Mount Lofty ranges 
might not be as cost effective 

as Spending on conservation 
elsewhere, despite the charms of 
its endangered orchids (opposite). 


whatever its impact factor; Balmford singles 
Cowling out for praise as someone “not just 
concerned with getting the algorithm to get 
the best bang for the buck, but with the more 
messy, more real, more interesting reality”. 
There is no reason why, in theory, one could 
not include the slushy stuff of real life as inputs 
in a prioritization scheme. “People say that 
this mathematical approach can't account for 
anything, but it can,” says Possingham. “The 
question is, can you put it in with a plausi- 
ble number?” Imagine a platonic scheme in 
which one could include the intransigence of 
a particular politician, the likelihood of a coup 
in a certain country, the relative value of the 
US dollar, the effect of eco-fatigue among the 
donating public, and the looming spectre of cli- 
mate change, each quantified and slotted into 
equations (along with values representing their 
uncertainty, of course). Such a marvel might 
give you the best tactics. But it would be no 
help in setting fundamental goals for future 
conservation — a subject on which unanimity 
seems about as likely as a full recovery for the 
Yangtze River dolphin. 
Emma Marris writes for Nature from 
Columbia, Missouri. 
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TO CATCH A WAVE 


Ocean wave energy is trying to break into the renewable-energy market, 
but many challenges remain. Ewen Callaway reports. 


he North Sea is not known for calm 

days, and neither is its inlet called 

Nissum Bredning, 300 kilometres 

northwest of Copenhagen. On a typi- 
cal afternoon, windsurfers skate across the 
grey-green water while birds seem to hover, 
frozen, in mid-air. Along the horizon stretch 
rows of giant white windmills, their long blades 
whirring in the gusts. 

“There’s really some good action today,” 
Per Steenstrup shouts over the gust as a mix 
of sea water and rain pelts his face. Steenstrup, 
an engineer from Copenhagen, is 200 metres 
offshore, aboard a steel platform with 20 large 
floats on each side lined up like oars on a 
Viking ship. A knee-high wave washes under 
the platform, and the floats move up and down 
a dozen centimetres in quick succession. 

Steenstrup opens the door to a small pre- 
fabricated structure on the platform, and sud- 
denly a mechanical roar rises above the noise of 
the wind and sea. “It’s a 40-piston engine run- 
ning on waves,’ he yells. In the control room 
next door to the turbine, Steenstrup peers at 


te 


a computer that keeps an instantaneous pulse 
on the Wavestar, as the platform is called. “At 
the moment, the output of power is around 800 
watts,” he says — enough to runa large-screen 
plasma television. 

The wind turbines on the horizon, for their 
part, are components in a well- 
established system that produces 
41 megawatts of wind power. 
That’s enough on some days to 
power the entire agriculture- 
dense region. 

Yet Steenstrup and dozens like 
him think that power harvested 
from ocean waves will one day be competitive 
with other methods for extracting energy from 
the physical environment such as wind, solar 
and hydroelectric. Wave energy is applicable 
only in a few regions of the world, and uses 
technologies that, for the most part, remain 
unproven. But given the scale of the energy 
challenge facing the world, supporters say that 
wave energy could supply enough electricity 
to make it part of a green-energy portfolio. 


“Wave energy is 
where wind was 


25 years ago.” 
— Alla Weinstein 


The European Ocean Energy Association in 
Brussels, for instance, estimates that the global 
resource for wave energy lies between 1 and 
10 terawatts; the world currently produces 
about 13 terawatts from all sources. Others 
see a more realistic number of 0.2 terawatts, or 
less, coming from wave energy; 
that’s still three times the cur- 
rent installed capacity for wind 
power worldwide. 

Whether wave energy becomes 
economical depends heavily ona 
new round of open-ocean tests 
that are under way from Portugal 
to Wales to Oregon. Engineers and entrepre- 
neurs are field-testing machines that until now 
have been scale models in water tanks. Few are 
looking for profits; they just want to see if the 
technologies can produce a consistent amount 
of power from the ocean. Success could attract 
funds from investors, industry and utilities; 
failure could set the field back years. 

‘A lot is riding on how well the first sets of 
large-scale devices work,’ says Tom Thorpe 
of the consultancy firm Oxford Oceanics in ., 
Grove, UK, which advises prospective wave- = 
energy investors and developers. “They've got g 
to be either successful or, if they fail, there has 8 
to be a good reason why.” x 


Blowing in the wind 

Wave energy’s most obvious parallel — and, 
perhaps, competitor — is wind energy. “We 
are where wind was 25 years ago, says Alla 
Weinstein, director of ocean energy at Finavera 
Renewables in Vancouver, Canada. Finavera’s 
prototype, the Aqua Buoy, sank off the coast 
of Oregon last month after operating for two 
months and before it was scheduled to be taken 
out of the water later this month. A quarter of 
a century ago, world capacity for wind energy 
was around 90 megawatts. But that was a nine- 
fold jump from just two years previously, and 
today countries such as Denmark and Ger- 
many get more than a tenth of their power 
from wind — although it still accounts for just 
1% of energy produced worldwide. 

Global figures for the power that waves pro- 
duce are hard to pin down, as most projects are 
still in the testing phase. But when the indus- 
try’s leading company, Pelamis in Edinburgh, 
Scotland, gets a new project online in Portugal 
— as is expected within weeks — that will add 
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2.25 megawatts from three machines. And 
wave-energy proponents think that the growth 
could be exponential after that. 

Waves offer several improvements over wind, 
although they are trickier to harvest. Wind is 
notoriously fickle; when gusts fail, utility com- 
panies have to deliver power to their customers 
from other sources. Waves can be fickle too, 
but are easier to predict, says George Hager- 
man, an engineer at Virginia Tech’s Advanced 
Research Institute in Arlington, who forecasts 
wave-energy days in advance using weather 
satellites. Knowing when waves are coming and 
how big they will be can save utilities money by 
cutting down on the power they need to keep 
ready in reserve, he says. 

Water is not only more reliable than wind; it 
is also 800 times the density of air. Aboard the 
Wavestar, it’s not a hard concept to grasp. “This is 
a big guy,’ Steenstrup says, eyeing an approach- 
ing metre-high curl like a surfer would. It hits 
head-on, and sea foam gushes over the floats. 
The steel platform shudders, even though it’s 
anchored to the seabed by concrete piles. 

Like other renewable energies, wave power 
works better in some locations than others. 
It takes more than just a shore to harness the 
power of waves. Because of the planet's prevail- 
ing winds, the best spots are on the west coast of 
continents in the mid-latitudes of the Northern 
Hemisphere, or on the east coast in the South- 
ern Hemisphere. Not coincidentally, most wave- 
energy tests are being installed in those spots in 
the North Atlantic and North Pacific oceans. 

Some of the strongest waves hit the Orkney 
Islands in Scotland, where the European Marine 
Energy Centre has established a wave-energy 
test site two kilometres offshore. The centre, 
which receives both government money and 
private funding, offers developers steady waves 
and easy connection to electricity grids to field- 
test machines. Pelamis began testing a 750-kilo- 
watt wave machine there in 2004, although it is 
now gone. Four other manufacturers plan to 
join the site in the next two years. In the South- 
ern Hemisphere, Australia-based Oceanlinx 
has been testing a 600-kilowatt machine off 
Port Kembla, New South Wales, since 2005, and 
is working on a larger, 2-megawatt model. 

In such places, wave energy could provide an 
alternative source of renewable energy to the 
usual standbys of wind and solar. Given recent 
government mandates to increase the power 
generated from renewable-energy sources 
— the European Union is aiming for 20% 
from renewables, and California 33%, by 2020 
— wave power could be another much-needed 
option. The targets are aggressive enough that 
all options could be needed, says Dan Kammen, 
director of a renewable-energy laboratory at the 
University of California, Berkeley. 


THE POWER OF WAVES 


Engineers have developed at least six main types 
of machine to harvest the mechanical power of 
waves. Field tests currently under way should 
reveal which of these will ultimately be practical. 


Attenuator 
This floating device effectively 'rides' the waves, 
flexing as they pass. 


—— 


Point absorber 
This float absorbs wave energy from all directions 
as it bobs up and down. 


i Ol 


Oscillating wave surge converter 
The tethered arm acts as a pendulum in response to 
wave surges. 


a ———— 


Oscillating water column 

As the water level goes up and down, a column of 
air is compressed and decompressed, powering a 
turbine. 


Overtopping device 

Collecting water from waves in a reservoir, this 
device powers a turbine as the captured water 
drains away. 


Submerged pressure differential 

As this device responds to the waves, a pressure 
differential is set up inside it, which is used to pump 
fluid and so generate electricity. 
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Some energy companies and industrial 
giants are already starting to take notice. Ata 
September conference in Porto, Portugal, repre- 
sentatives from the national utilities of France, 
Denmark and Portugal attended the usually sci- 
ence-focused meeting. In California, the utility 
behemoth Pacific Gas and Electric has sought 
permission to establish wave-energy test sites 
off the coast of northern California. And in 
2005, the hydroelectric firm Voith Siemens in 
Heidenheim, Germany, purchased Wavegen, a 
Scottish developer. 


Patents and promises 

For much of its 200-year history, wave energy 
has been flush with ideas but short on results. 
In 1799, French engineer Pierre Girard and his 
son filed the first patent to harness power from 
waves. Never constructed, the device was to 
work by linking the bobbing of moored ships 
to heavy machinery ashore via a plank and ful- 
crum. Through the nineteenth and twentieth 
centuries, patents trickled out of inventors’ 
workshops, but no machine ever produced 
enough power to gain widespread use. 

Wave energy’s supporters began moving 
out of garages and into government ministries 
in the early 1970s. The embargo imposed by 
the Organization of the Petroleum Exporting 
Countries (OPEC) propelled the price of crude 
oil from $7 in 1970 to $38 by 1974. Many coun- 
tries saw independence from Middle East oil in 
renewables, beefing up their research into wind 
and solar energy. When it came to ways to get 
energy from the sea, the United States mounted 
an ultimately unsuccessful effort to capture 
thermal energy from oceans by exploiting the 
temperature difference between deep and sur- 
face water (see ‘Energy from the sea, overleaf). 
Meanwhile, Britain led the way in wave energy. 

In 1974, the UK government commissioned 
academia and industry to draw up plans for 
how the country could bring wave power 
into the mainstream. The first prototypes for 
wave-energy machines were “hopelessly uneco- 
nomic’, says Thorpe. The plans called for mas- 
sive machines that cost more than $100 million 
each to generate 2,000 megawatts of electricity 
— roughly the output of the nuclear and oil- 
powered plants they were designed to replace. 
No rationales were given as to how such a tech- 
nological leap would be made, says Thorpe. 
None of the proposed plants was ever built. 

A 1983 progress report effectively ended 
Britain’s foray into wave energy, saying that 
the technology was unproven and too costly. 
Developers felt betrayed by the criticism, 
prompting the government to commission 
Thorpe to repeat the review; yet he came toa 
similarly dim conclusion in 1992. 

As wind energy took off in the 1980s, wave 


157 


K. UNGER/VERDANT POWER 


NEWS FEATURE 


energy went back to its roots, in university 
laboratories and inventors’ workshops. Les- 
sons learned from the early failures and from 
offshore oil rigs would guide the designs of a 
new generation of machines. 

Currently, at least 50 wave-energy projects 
are in development, with more appearing every 
year. Analysts divide the machines into more 
than half a dozen breeds (see graphic), each 
with a different trick to turn waves into elec- 
tricity. Pelamis’ resembles a giant snake with 
three segments that shimmy back and forth. 
Oceanlinx’s looks like a giant steel bagpipe 
that’s played by a rising and falling water col- 
umn. And Finavera’s are oversized buoys that 
use waves to drive hydraulic pumps. 

Such heterogeneity is natural for a field in 
its early days — but within a decade the vari- 
ous designs should shake out into those that 
are practical and those that aren't, says analyst 
Roger Bedard of the Electric Power Research 
Institute, a think-tank based in Palo Alto, Cali- 
fornia. “It's still anybody's game.” 

Thorpe is more sceptical. “There are lots and 
lots of ideas out there and hundreds and hun- 
dreds of patents,” he says. “Some of these actu- 
ally defy the laws of physics, many of them will 
not be technically viable, even more of them 
would not be economically attractive — and we 
are left with very, very few designs that I think 
have a chance, ona 10- to 15-year timescale, of 
becoming economic.” 

Even now, the most promising designs can be 
washed under by the smallest technical glitch. 


A short drive from Steenstrup’s Wavestar rests 
a competing project: 237 tonnes of crimson- 
painted steel and concrete dotted with barnacles. 
Curled up onshore, the Wave Dragon resembles 
a giant piece of playground equipment, its steep, 
curved walls sloping up to a large concrete bed 
with an opening at its centre. When operating, 
the Wave Dragon floats in open water; waves 
gush over its wall and into a hole, where they 
power a turbine. 

Many see the machine as one of the indus- 
try’s leading prospects. Installed in Nissum 
Bredning in 2003, the 20-kilowatt 
device ran for 20,000 hours, says 
Lars Christensen, a developer 
in Wave Dragon’s Copenhagen 
office. But the machine has been 
ashore since early this year after 
a rusted screw put it out of com- 
mission. The screw, it turned out, 
should have been made of stain- 
less steel. And Finavera’s Aqua Buoy was appat- 
ently sunk by a pump that failed to remove 
water once the device started leaking. 


Trial by error 

Such minor errors underscore the difficulty of 
engineering devices to withstand the demands 
of the open ocean. Waves come in all shapes 
and sizes, and most devices are designed to 
run on average ones. Yet to last years without 
regular maintenance, they must withstand 
swells twenty times more powerful. To counter 
such storms, many of the new machines are 


Energy from the sea 


Waves are just one way to extract 
power from the oceans; at least 
three other technologies attempt 
to harvest electricity from the sea. 
Tidal power, captured from the 
tides as they rise and ebb, isa 


Since 2006, Verdant Power has 
worked to extract about 1,000 
kilowatt-hours per day from the 
tidal flows in New York City’s East 
River, although the turbines are 
now under repair (pictured). 


“A lotis riding on 
how well the first 


sets of large-scale 
devices work.” 
— Tom Thorpe 


Such designs have not gained 
widespread use mainly because of 
environmental concerns. 
The sea’s largest cache of energy 
is thermal, with an estimated 
resource of 1.1 terawatts. The 


moored loosely to the ocean floor, allowing 
them to better absorb a pummelling. “The 
ocean is really going to beat these things up,” 
says Hagerman. “They need to be out there 
for a few years to demonstrate that they can 
survive.” 

In 1988, a severe storm destroyed a 600- 
kilowatt pilot plant made by Kvaerner Brug, 
a Norwegian firm. The shore-based machine 
— one of the few built in the 1980s — pro- 
duced power for just three years, and the com- 
pany later abandoned wave energy. Not to be 
deterred, Wave Dragon plans to 
install an even larger model next 
year in Wales, where the waves 
dwarf those in Denmark. 

Although technical hurdles 
could torpedo any one machine, 
engineering alone is unlikely to 
sink the whole field, says Thorpe. 
He sees greater challenges in the 
potential costs of developing and delivering 
wave energy in the face of competition from 
traditional and other renewable sources. 
“Technically it will work,” he says. “Getting 
the cost down is a significant challenge, and I 
think some of them are going to be successful 
— but not that many” 

The UK-based Carbon Trust has esti- 
mated that wave power costs between 25 and 
91 US cents per kilowatt-hour. Investing some 
£2.2 billion (US$4.6 billion) could bring the cost 
down to 12 cents per kilowatt-hour, it estimates, 
although that will depend on cost-cutting 


Hawaii, although funding dried 
up in the mid-1990s. Lockheed 
Martin is now exploring the 
possibility of building new plants 
in conjunction with the 

US Navy. 


resource estimated at 91 gigawatts, More projects are idea is to get electricity Finally, the chemical difference 
according to a report from the UK under way in Europe. by tappingintothe between fresh water and sea 
consulting firm AEA. Francehaskeptits 4 roughly 20 °C water holds energy in the form of 

Inthe United States, most of 240-megawatt = Saee temperature a salinity gradient that can create 
the tidal resource is in Alaska, La Rance tidal- ae difference water pressure and run a turbine. 
far from populations and energy energy plant in ) between the The global salinity resource is 
infrastructure. Other sites, operation since ) surfacewater estimated at 220 gigawatts, but 
such as San Francisco Bay in the 1960s. in tropical the technology to tap this source 
California, aren't ideal because of New tidal waters and has lagged behind the others. 
competition with shipping lanes. technologies use that pumped Statkraft, a Norwegian firm, 
“Under the Golden Gate Bridge,” turbines powered from depths of has now opened a laboratory in 
says Roger Bedard of the Electric by beefed-up 1,000 metres or Trondheim to look into it. The 
Power Research Institute in Palo propellers that spin as so.Inthe1970s,the —_ largest technical barrier seems 
Alto, California, “it ain't going to the tides rise and fall. But United States invested to be the membrane that 
happen." Still, at least one firm in the French plant, the tides pump heavily in such thermal energy; maintains a salt concentration 
has managed to make tidal energy — water behinda barrier. The ‘dam’ several small plants generating between salt water and fresh, 
work in a densely populated area. is then released to create power. up to 50 kilowatts were built in according to AEA. E.C. 
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Sea snake: Pelamis has been testing its 750-kilowatt wave machine in the Orkney Islands in Scotland. 


innovations that will make or break the field. 
Wind energy, however, can cost as little as 4 
cents per kilowatt-hour. The price of solar 
energy varies with location, but averages 
around 19 cents per kilowatt-hour for utility 
installations. 

Desperate to diversify their energy supplies 
and narrow the gap between wave energy and 
other sources of power, European governments 
have started to offer subsidies and grants. The 
Portuguese government pays developers 
32.5 cents for every kilowatt-hour they put on 
the grid. It uses the same strategy to support 
other renewable sources. In Britain, the Marine 
Renewables Deployment Fund will give out up 
to £50 million to push devices to field-testing 
and commercialization. The United States 
offers no specific support to wave energy, but 
a bill to provide $50 million in funding per year 
for five years has been introduced in Congress 
by Representative Darlene Hooley (Democrat, 
Oregon). 

Unsurprisingly, many developers com- 
plain that governments haven't been gener- 
ous enough towards the field. According to a 
report from the UK consulting firm AEA, the 
members of the International Energy Agency, 
which includes nearly every country investing 
in wave power, spent just 0.3% of their renew- 
able-energy budgets on ocean energy between 
1974 and 2004. That’s equivalent to US$800 
million adjusted for inflation. 

So some private investors have opened 
their wallets. Nearly three-quarters of Wave- 
star’s recent funding round came from private 
investors such as the chief executive of Danish 
industrial giant Danfoss, and several wave- 


energy start-ups are publicly traded. Yet Thorpe 
worries that backers will flee if the field doesn't 
return money quickly. And with so many firms 
competing for attention and few side-by-side 
comparisons available, investors could throw 
their money at doomed projects. “A lot of peo- 
ple have invested in wave energy without taking 
a serious look at the economics,’ he says. 
Energy companies with outdated power 
grids could also dash the hopes of wave-energy 
supporters. Designed to handle large, centrally 
located power plants, many utility company 
networks are unprepared for the dispersed 


Bobbing buoys: Finavera’s Aqua Buoy used to 
harvest energy from the waves near Oregon. 
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nature of renewable energies. “Historically, 
their role in this was to say ‘over my dead body,” 
says Thorpe. “The last thing they wanted was a 
wave-energy device on their network.” In some 
countries, such as the United Kingdom, power 
infrastructure is minimal in the remote coast- 
lands with the biggest waves. The situation is 
better in Portugal, which boasts an extensive 
grid up and down its coast, which wave-energy 
developers could theoretically plug into. 


Go with the flow 

Although most environmentalists see wave 
energy as a valuable green alternative to fossil 
fuel, opposition has emerged from groups that 
compete for the ocean. In Oregon, local fisher- 
men wary of being pushed out of waters they 
have fished for decades initially opposed test 
projects proposed by Finavera and other devel- 
opers. “There's no law that says that renew- 
able power supersedes renewable food, says 
Terry Thompson, a county commissioner in 
Newport, Oregon, and a retired crab fisher- 
man. After Thompson brokered discussions 
between Finavera and his county's $100-mil- 
lion fishing industry, the firm installed its 
Aqua Buoy out of the way of fishing grounds. 
Yet Thompson worries about what might hap- 
pen when companies move from tests to full- 
scale wave farms. 

Surfers have also aired concerns over efforts 
to cut into waves before they’ve ridden them. 
In Oregon, a non-profit advocacy group, the 
Surfrider Foundation, has filed a complaint 
with the US government over Oceanlinx’s 
plans to build a wave farm off its coast. And 
in Cornwall, UK, surfers have complained 
about a wave-energy farm that is under discus- 
sion for instalment off the coast. But, as with 
fishing, wise planning could side-step any 
conflicts with surfers, especially in hotspots 
such as Hawaii. “I wouldn't put it in the north 
shore of O’ahu,” jokes Christensen. “That 
would be suicidal” 

Back in Nissum Bredning, though, the wind- 
surfers and gulls remain unperturbed, and 
Steenstrup’s most pressing concern is fund- 
ing. To make the leap to commercialization, the 
Danish engineer needs to build a Wavestar five 
times larger than the platform his hopes now 
rest on. The first prototype will cost $11 mil- 
lion, he says. 

Waves will continue to crash on the shores 
of Nissum Bredning, Oregon and Portugal no 
matter what happens in the new round of tests. 
It should soon be clear whether the technol- 
ogy aimed at moving wave energy forwards 
can keep the concept afloat. 

Ewen Callaway is an intern in Nature's 
Washington DC office. 
See Editorial, page 135. 
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Time: research necessities 
make it hard to keep track 


SIR — Plans to impose effort-reporting on 
scientists, as mentioned in your Editorial ‘On 
the paper trail’ and News story ‘Researchers 
criticized for poor time-keeping’ (Nature 
449, 508 and 512-513; 2007), will be difficult 
to implement. In practice, it is almost 
impossible to give an accurate estimate of 
effort, because scientists are rarely off the job, 
even when asleep. If they are not actually 
doing a particular task, they are planning the 
next, or puzzling over the most recent 
observation. How should that time be 
counted? In most research, the edges of a 
project are only known indistinctly. So in 
many cases it is difficult to know when one 
has wandered from one project to another 

or into an unfunded area. 

The definition of ‘100% effort’ causes 
another problem. This metric tends to be 
based on a 40-hour week, but not many 
funded scientists can afford to do so little 
work — 80 hours (200% effort?) is more 
common. Also, in these days of tight budgets, 
how should the project director account 
for the time spent on writing and revising 
new grant applications? Under the current 
funding system, a very large fraction of time 
is spent on this activity. 

Finally, it must be recognized that boot- 
strapping a new project with the funds of 
an existing project is built into the system. 

If there is no venture capital available, new 
grants have to be built on the backs of existing 
ones. After all, that’s how they got funded in 
the first place. When was the last time a new 
proposal was funded in the absence of any 
preliminary data? 

There may be some abuse in very large labs 
with multiple sources of funding. Filling in 
the blanks on applications and enforcing 
appropriate effort reports should be the 
norm. But please don't reduce the time 
available for research still further by making 
researchers account for every moment. 
Robert J. O'Connell 
Brudnick Neuropsychiatric Research Institute, 
303 Belmont Street, Worcester, 

Massachusetts 01604, USA 


Time: accounting problems 
caused by Caltech system 


SIR — You report on a National Science 
Foundation (NSF) audit of Caltech’s 
accounting system in your News story 
‘Researchers criticized for poor time-keeping’ 
(Nature 449, 512-513; 2007). The audit cited 
accounting deficiencies, in particular the 
handling of professorial effort as “voluntary 
committed cost sharing” as opposed to 
“voluntary uncommitted cost sharing”. 


Although faculty and staff were following the 
Caltech accounting practices that were in 
place at the time, your News story states that 
researchers “failed to report this to Caltech’s 
payroll system”. In fact, we did not have that 
option because of the deficiencies in the 
system. The audit report did state that 
“Caltech’s responses, once implemented, 
should address our audit recommendations”. 

Although the NSF report discusses 
interviews and facts related to several 
principal investigators and their grants, it 
does not name individuals. Even though the 
report does not refer to me by name, your 
News story associates information in the 
report with me personally. The NSF auditor, 
Joyce Werking, incorrectly recorded my 
statements about my time allocation in the 
report. The statements about me in your 
article are erroneous, taken out of context 
and unfair to me. Although Nature did 
attempt to contact me during the week before 
going to press, I was away at the time and 
unable to respond. 

Certainly, Caltech and other universities 
should increase their efforts to align their 


accounting practices with agency regulations. 


Also, NSF should improve its methods for 
gathering and accurately reporting 
information. And Nature could have 
presented a more informed, responsible and 
balanced view. 

Robert D. McKeown 

Department of Physics, Mail Code 106-38, 
California Institute of Technology, Pasadena, 
California 91125, USA 


Turkish science suffers as 
government vies with law 


SIR — Praise for the present Turkish 
government's work in fostering good science, 
in your Editorial “Turkey's transformation’ 
(Nature 449, 116; 2007), reads to my eyes like 
a cruel joke in the face of what is really going 
on in the country. It is true that the current 
government has increased the budget placed 
at the disposal of Tubitak, the main research 
agency under government control. It did so, 
however, by slashing the budgets of the 
independent universities. 

Tubitak’s new administration was 
appointed in a manner that was decreed 
illegal by the Turkish courts. As a result, the 
Turkish Higher Education Council advised 
the universities not to have any dealings or 
communication with Tubitak, because of the 
legal status of its administration. Therefore, 
no Turkish scientist can legally use a penny 
of the increased research budget. That some 
do use it, in violation of the law, is an act of 
desperation, because few other sources are 
left that can be used to sustain research and 
support students. For example, I had already 
had two projects accepted when the present 
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administration was unlawfully appointed but 
I withdrew them immediately; since then I 
have had no interaction with Tubitak. 

The bad health of Turkish science is further 
demonstrated by the party-political 
appointments made by the government 
within organizations such as the Geological 
Survey and the Atomic Energy Commission, 
as well as many senior and junior academic 
posts, which are not made on the basis of 
scientific merit. Many of these appointments 
are being challenged in the courts. Another 
source of serious headache for Turkish 
science is the minister of education's 
sympathy for ‘intelligent design; and the way 
that evolution is taught in our country. 

A.M. Celal Sengér 

Istanbul Technical University, 

Eurasia Institute of Earth Sciences and 
Department of Geology, Faculty of Mines, 
34469 Istanbul, Turkey 


Explorers’ challenge sunk 


by Arctic warming 


SIR — The so-called Northwest passage 
(between the Pacific and the Atlantic) has 
become fully navigable, as mentioned in your 
News story ‘Arctic melt opens Northwest 
passage’ (Nature 449, 267; 2007). It is worth 
recalling that when Roald Amundsen led an 
east-west expedition through the Northwest 
passage on the ship Gjga, it took him two and 
a half years to reach Gjoahaven (now called 
Gjoa Haven) in mid-August 1905. He wrote 
in his diary: “The North West Passage was 
done. My boyhood dream — at that moment 
it was accomplished. A strange feeling welled 
up in my throat; I was somewhat over- 
strained and worn — it was weakness in me 
— but I felt tears in my eyes.” 

In June 1940, the Royal Canadian Mounted 
Police vessel St Roch left Vancouver to sail the 
passage from west to east. It docked at Halifax 
on 11 October 1942. In 1944, the St Roch 
returned to Vancouver by a more northerly 
route, cutting the time down to just 86 days. 
More recently, icebreakers and ice- 
strengthened ships have on occasion 
traversed the route. But by the end of the 
2007 melt season, a standard ocean-going 
vessel could have sailed smoothly through, 
proof indeed that the Arctic summer ice is 
rapidly diminishing. 

A. J. (Tom) van Loon 
Geological Institute, Adam Mickiewicz University, 
Makow Polnych 16, 61-606 Poznan, Poland 


Contributions to Correspondence may be 
submitted to correspondence@nature.com. 
Published contributions are edited. 

Science publishing issues of interest to 
authors are regularly featured at Nautilus 
(http://blogs.nature.com/nautilus), where 
we welcome comments and debate. 
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Rules of engagement 


Is there an inherent conflict between public debate and free scientific inquiry? Patrick L. Taylor argues that 
earning public trust is essential to defending scientific freedoms. 


research has gone viral. Today, 

public consultation is invoked 
for subjects as diverse as war veter- 
ans’ responses to genomic research, 
responsible nanotechnology and the 
use of animal transplants in humans. It 
has also gone global, as demonstrated 
by the just-completed consultation 
on research using animal-human 
hybrid embryos by Britains Human 
Fertilisation and Embryology Author- 
ity (HFEA), and the Singapore gov- 
ernment'’s plan to consult on hybrid 
research and oocyte donation later 
this year. As groups of citizens mobi- 
lize and blog on science issues — from 
patenting to public health and drug 
development — it is time to recon- 
sider the ground rules for public 
debates on science. 

Public engagement today can directly 
affect research. It has gone beyond 
debating controversial social impacts 
of applied science and technology such 
as should this nuclear power plant be 
built or that pesticide be approved. 
It now delves into research meth- 
ods that are unique to the laboratory, 
such as somatic-cell nuclear transfer 
and hybrids. 

Sceptical scientists fear that scientific 
means and ends might be determined 
by referendum, and that public debate 


Piss engagement in scientific 


In the public mind there is no such thing as ‘pure’ science. 


has shown’, in her comparative 
political study of biotechnology in 
the United Kingdom, Europe and 
North America, that where public 
engagement is insufficiently avail- 
able formally, it will occur infor- 
mally, through public protest; market 
choices, such as consumer rejection 
of genetically modified foods; or new 
political structures, such as environ- 
mental movements. 

So the question is not whether 
public involvement should occur. The 
questions are how, with what impact, 
in what areas, and with what level of 
trust and precision? Some forms of 
engagement, as in California’s launch 
of stem-cell funding, have been suc- 
cessful: the engagement produced a 
social consensus sufficient for action, 
based on reasonably evaluated truths, 
which supported legitimate scientific 
work. Other forms have been less suc- 
cessful, increasing scientists’ frustra- 
tion and public distrust. How should 
public engagement proceed, to be 
productive, credible and precise? 


Undermining ourselves 

We should start by recognizing 
where we in research organizations 
— scientists, administrators, law- 
yers, ethicists — harm public trust 
and engagement, through discredit- 


will irrationally restrict free scientific 
inquiry. The occasional shrillness of public 
comment, or its seeming ignorance of scientific 
facts, can trouble scientists who are weary of the 
politics and misunderstandings they see plagu- 
ing important research; it can lead researchers 
to question the value of public engagement. 

Some would prefer policies to be decided 
by government bodies alone. In other words, 
yes, the public has a right to decide whether 
it would fund a new atom bomb, but no, the 
public has no legitimate stake in limiting scien- 
tific freedom and professional judgment about 
research aims and methods. Yes, the public 
has a right to deliberate on certain aspects 
of scientific policy, but through government 
representatives, to ensure that public igno- 
rance is disarmed, and scientists can advise 
government on scientific fact. 


This position is mistaken. It was legislative 
bodies and government executives that acted so 
irrationally in the Terri Schiavo case, in which 
a Florida woman in a vegetative state became 
the subject of politicized battles. Three US sur- 
geon-generals have testified about presidential 
tampering with science policy. And although 
fears of the politicization of sci- 


ing our own and others’ ability to be 
persuasive participants. 

When scientists advise governments, or pub- 
lish, they often demand that the public, peers 
and editors treat them as disinterested, when 
in fact — in simple financial terms that the 
public understands — they are not. Scientists 
are surprised when the public discounts their 
views. On advisory committees, 


ence are easily overblown, the “Nearly half of some scientists want to be funded 
time may come — perhaps itis the publicfeared by interested companies, yet they 
hrcaraly when rep theslippery slope Wan publicly perce 
protection against politicians of what scientists ter how much we believe this, we 
who are selectively unfriendly to mi ght do next.” cannot force the public to accept a 


scientific freedoms. 

Public involvement is inevitable, whether 
invited or not. Sheila Jasanoff of Harvard 
University in Cambridge, Massachusetts, 
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combination that ordinary experi- 
ence convinces them is suspect. Some lawyers 
and public figures polarize through exag- 
geration, whereas some ethicists’ credibility 
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is sacrificed to sophistication and inaccessible 
arguments. Together, such ‘experts’ convince 
the public that we cannot be trusted to arrive 
at a reliable consensus rooted in shared societal 
values, common sense and public intuitions. 

We know a lot about what the public fears 
from science and expects from responsible 
policy, and what forms of dialogue are effec- 
tive. The HFEA consultation teaches us’, again, 
what research surveys have taught us before*®. 
If in framing consultations we ignore that 
knowledge, we suggest that responsible self- 
regulation, and concerns about unforeseen 
consequences, social impact and disinterested- 
ness, have not occurred to us. Instead, we must 
explicitly address the prevailing perception 
that “scientific research does not pay 
enough attention to moral values”® 
and recognize that trust and charac- 
ter, as much as subject expertise, are 
critical to being persuasive. 

The best sort of public consulta- 
tion is flexible, including deliberative 
methods, surveys and interactions 
validated by social scientists. But even 
these methods can lead to frustrating 
outcomes. The HFEA consultation 
delivered what scientists wanted: UK 
licences for hybrid research will be 
considered on a case-by-case basis. 

On the broad question of the desir- 
ability of creating chimaeras, the 
consultation process admitted that 
its attempts at public engagement 
“were unsuccessful, as the public are 
reluctant to form a committed view 


licences should be granted is less precise than 
asking for public reaction to delineated alter- 
natives. I believe that, if such reactions had 
been sought, more of the public would have 
endorsed the HFEA%S ultimate decision to con- 
duct case-by-case reviews. 

In thinking long-term about public engage- 
ment — by which I mean a range of activities, 
not just direct consultation — here is what I 
think we should do. 

First, when soliciting public engagement, we 
must be clear what the public is being asked to 
contribute. We should invite perceptions of how 
society and quality of life will be affected under 
alternative scenarios, yet avoid relying solely on 
open-ended questions uninformed by potential 


Public debate is no longer limited to the immediate effects of research. 


will do their best to make sure that nothing 
is ‘lost in translation” 

Second, we ought to encourage the public to 
share in understanding the wonder of scien- 
tific developments. Physicists have done this 
much better than biologists, thoroughly engag- 
ing the public in mysteries of space and time. 
Life scientists do some of this, around ecol- 
ogy, evolution, health and hopes for promised 
cures, but not enough. We need to do much 
more, engaging people's imaginations in what 
development, genetics, molecular biology and 
physiology mean for knowledge, community 
and our world-view. There is a new world dis- 
covered every day in the biosciences, and we 
are not taking full advantage of the part that 
public imagination and fascination 
can play in supporting natural science 
for its own sake. 


Maintain our neutrality 

Third, we need to respect and actively 
support the neutrality, credibility and 
independence of bodies of scientific 
expertise, particularly advisory com- 
mittees and academic journals. This 
means preserving the credibility of 
individual scientists, as well as that 
of the independence of journals and 
committees, by increasing transpar- 
ency and avoiding personal conflicts 
of interest. It also means limiting 
ourselves to honest criticism, and 
ensuring that debates about scien- 
tific reporting in the media remain 
focused on objective scientific review 


without understanding the full con- 

text of the research proposals”. Queried gener- 
ally about whether licences should be granted, 
nearly half of the public feared the slippery 
slope of what scientists might do next, and 
nearly half had concerns about meddling with 
nature. Many, it appeared, felt “far removed” 
from biomedical research. 


Question further 

The HFEA rightly used a combination of 
approaches, including representative opin- 
ion polls, deliberative sessions, written con- 
sultations and public 
meetings. But what if one 
went beyond asking the 
public open-ended ques- 
tions about which types of 
embryo research are per- 
sonally acceptable, or what 
types of licences should 
be given? 

Open-ended questions do elicit important 
responses, but imagine asking additional ques- 
tions that demonstrate the questioner’s aware- 
ness of known public concerns, such as the 
needs to provide careful oversight, to be sensi- 
tive to runaway consequences and to offer pre- 
cise alternatives based on the best professional 
judgment about solutions to reassure public 
fears. Asking, without conditions, whether 
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"Scientists should abandon 
the head-in-sand hope, in 
this Internet age, that we 
can return to some golden 
era of politically insulated 
scientific seclusion.” 


solutions. In seeking to understand public 
perceptions and engage in open exploration of 
complex issues, engagement must build public 
confidence. This confidence will come with the 
knowledge that unexpected consequences and 
personal and social effects will be considered 
and transparently addressed, because this is 
where the public believes we fail. 

If we ask question that are too conclusory 
— such as whether a power plant should be 
built next door — to a public unaware of how 
its concerns might actually be addressed, we 
underestimate the pub- 
lic, and it will respond by 
underestimating scien- 
tists’ good sense and good 
faith. A request for public 
input should distinguish 
factual points established 
by scientific methods and 
validated data from public 
interpretation of the meaning of those facts 
and goals. It should exchange ideas on the 
way that scientific objectives and solutions to 
public concerns can best be married. Collec- 
tive deliberation is essential. Just before her 
death, the celebrated biologist Anne McLaren 
succinctly said this: “Education of the public 
is not enough... Let us aim for an informed 
dialogue, and let us hope that the media 
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of data and conclusions, rather than 
descending into partisan attacks. We need 
to challenge more consistently the errone- 
ous statements made by politicians of what 
is scientifically known, especially through 
unflagging public responses by scientific and 
professional societies. 

Fourth, we must be continuously creative in 
public engagement. The whirlwind of scientific 
and biotechnological change must be met with 
complementary engagement, in which people’s 
need to know and evaluate can be grounded in 
intelligent understanding of possible solutions 
to their concerns. Scientists should abandon 
the head-in-sand hope, in this Internet age, that 
we can return to some golden era of politically 
insulated scientific seclusion. Abandoning real 
public engagement is not ending it. It is aban- 
doning it to the very forces scientists fear. ™ 
Patrick L. Taylor is at Children’s Hospital 
Boston and Harvard Medical School, Boston, 
Massachusetts 02115, USA. 
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Benchmarks for ageing studies 


The hopes for improving human health during ageing are largely based on studies with animal models. But 
Linda Partridge and David Gems ask if we are learning the right lessons from ageing research. 


geing is complex. Diverse molecular and 
A cellular damage accumulates over time, 

causing functional failure in different 
tissues. The process can seem to be intractable 
to experimental analysis or medical interven- 
tion — a pessimistic view overturned by the 
discovery of genetic mutations that can extend 
healthy lifespan in laboratory model organ- 
isms’. Perhaps even more surprising, these 
genetic effects seem to be conserved over large 
evolutionary distances, because mutations in 
related genes can extend lifespan in the yeast 
Saccharomyces cerevisiae, the nematode worm 
Caenorhabditis elegans, the fruitfly Drosophila 
melanogaster and the mouse””. 

Thus, despite their very different physiology 
and lifestyles, the simpler and shorter-lived 
yeast, worm and fruitfly can be used to help 
understand mammalian ageing. Mutations that 
extend lifespan can also reduce the impact of 
ageing-related diseases, including cancer, cardi- 
ovascular disease and neurodegeneration’. This 
raises the prospect that a single, underlying age- 
ing process may act asa common risk factor for 
multiple diseases. Drugs that slow down ageing 
could therefore reduce the impact of many of 
the ageing-related diseases simultaneously. 

Yet the biology of ageing is a young field with 
emerging pitfalls (see table, page 167). Genetic 
mutations that increase longevity will generate 
exciting headlines, but experimental findings 
are only as good as the experimental design. For 
example, neglect of genetic background effects 
can lead to misleading or hard-to-interpret 
results’. If researchers are rescuing a lab strain 
from the effects of a disadvantageous environ- 
ment or genetic background, then we may be 
learning less about human 
ageing than we think. It is 


“Ultimately, the findings 
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The lifespan of model organisms (such as fruitflies) is affected by the lab environment. 


the background is made identical in mutants 
and controls*. Often, organisms of the same 
strain, but from different laboratories, actually 
differ in lifespan. A goal of longevity research is 
to identify gene products that control lifespan 
and that could be good drug targets; however, 
reported effects of drugs on lifespan are them- 
selves not always reproducible’. 

Understanding the reasons for all this vari- 
ability is a key challenge for the field. We need 
to investigate the robustness and repeatability 
of findings, to better control experimental 
variables and to establish some benchmarks 
for experimental work on lifespan. 

One immediate problem for measurements of 
lifespan is that death can be a surprisingly slip- 
pery endpoint. When meas- 
uring population lifespan, 


critical to learn the right les- of ageing research it is critical to discriminate 
sons from this work if the field need confirmation in between accidental and age- 
is to flourish and provide real humans.” ing-related deaths, and to 


insights into human ageing. 

The model-organism approach has already 
told us much about other processes, such as 
development, immunity and behavioural traits. 
But compared to many developmental traits, 
such as the number of eyes or limbs, lifespan 
is more sensitive to the genetic make-up of 
different strains and to the environment. For 
instance, a mutation always occurs in a genetic 
background — the rest of the genome — and 
in some cases lifespan differences vanish when 


exclude the former, which can 
be tricky. Flies can meet with accidental death 
from sticky food products and worms can die 
prematurely when young worms hatch inside 
their parent and devour it from within. 
Identifying the cause of death is therefore 
a somewhat subjective process, vulnerable to 
potential bias. So it is wise to work in ‘blind’ 
conditions — ignorant of the identity of the 
experimental treatment and according to 
stated criteria. Different observers should be 
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able to obtain similar results. For mice, animals 
can be excluded from a study for veterinary 
reasons, and this should also be done accord- 
ing to a standard set of criteria, and reported 
in the literature. Such precautions might seem 
obvious, but they are routinely overlooked. 


Variance within strains 

The most pervasive problem in lifespan 
research is the presence of uncontrolled 
genetic differences between strains under 
study. Substantial natural genetic variation 
can exist across the genome among different 
individuals and strains. A mutant gene arose, 
at some point, in a single individual. That indi- 
vidual had a particular genetic background, 
which will be passed to progeny along with 
the mutant gene. 

To detect the influence of a specific mutant 
gene on lifespan, it is essential to place it ina 
genome that is otherwise identical to that of 
the strain used as the control. This is best done 
by repeatedly crossing carriers of the mutation 
with a standard genetic strain (Fig. 1, overleaf). 
In flies and mice it is often necessary to bring 
two different mutations together to produce 
the desired gene expression, thereby also cross- 
ing the two genetic backgrounds in which the 
mutations reside. This will introduce uncon- 
trolled genetic variation unless both strains 
have been previously backcrossed to the same 
control, and hence have the same genetic 
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background as each other. For mice, where new 
mutants are usually introduced into a mixed 
genetic background for technical reasons, it 
can take years of backcrossing to establish the 
mutation in the control strain. 

An unfortunate and under-appreciated fact 
is that backcrossed strains can diverge geneti- 
cally from each other quite rapidly. Spon- 
taneous mutations occur regularly, causing 
gradual genetic divergence between strains. 
In addition, the presence of a mutation can 
lower reproductive success, so that natural 
selection acts on genetic variation elsewhere 
in the genome, causing it to diverge, often rap- 
idly, from the genome of the control strain. 
The effects of a mutation can therefore lessen 
with time. For example, the yellow mutation in 
Drosophila makes the fruitfly a golden colour 
and also greatly impairs the courtship behav- 
iour of male fruitflies. Female fruitflies from 
long-standing yellow stocks are more receptive 
to the courtship of yellow males, presumably 
because natural selection has acted to prevent 
the females from failing to mate. 

Strains can also diverge genetically in the 
lab because low numbers of individuals lead 
to inbreeding. This allows natural harmful 
genetic variants to become more common, 
resulting in lowered survival and fertility. This 
is particularly problematic whenever inbred 
strains are crossed with each other, for instance 
to produce targeted changes of gene expres- 
sion, because hybrid offspring typically have 
a longer lifespan than their parents. Thus, one 
should not assume that strains that were back- 
crossed several generations ago have remained 
genetically homogeneous; regular backcross- 
ing is required to achieve this. 


Animperfect world 

Genetic divergence between strains over time 
is also problematic when it comes to compar- 
ing results from different laboratories. In a 
perfect world, laboratories would all use genet- 
ically identical strains to ensure direct compa- 
rability of results and to eliminate uncontrolled 
sources of genetic variation. However, the real- 
ity is that laboratory stocks can differ substan- 
tially in lifespan, even when ostensibly from 
the same strain. For instance, the C. elegans 
strain N2, originally isolated from mushroom 
compost, is treated as the wild-type by conven- 
tion. Yet a comparison of ‘N2’ strains from dif- 
ferent laboratories revealed median lifespans 
ranging from 12 to 17 days’. This problem can 
be partly addressed by freezing strains that are 
not in use, to minimize divergence, but this is a 
major weakness of using Drosophila, for which 
attempted methods of cryo-preservation have 
been largely unsuccessful. To improve repeat- 
ability in fruitfly studies, laboratories may 
therefore need to share strains immediately 
before undertaking measurements. 

The variable longevity of strains raises a 
question that also applies to human ageing. 
What constitutes a ‘normal lifespan? In the N2 
worm study described above, only the longest- 
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Figure 1 | Backcrossing model organisms. 
Repeatedly backcrossing mutants with controls 
will prevent genetic divergence and create 
individuals with identical genetic backgrounds. 


lived of the laboratory strains had a lifespan 
resembling that of wild strains’; the rest had 
accumulated life-shortening mutations during 
laboratory culture. This shortening of lifespan 
in lab strains is also seen in Drosophila’. The 
results for mice are less clear-cut, but sugges- 
tive of slower-maturing lab strains®. An addi- 
tional problem with mice is that inbreeding can 
shorten lifespan, and the standard lab stock 
are deliberately inbred to produce genetic 
uniformity within strains. This problem can 
be avoided by the use of mice that are the 
product of a standard cross between multiple 
strains, which produces a standard but outbred 
mouse’. For the invertebrates, backcrossing to 
wild strains and the use of husbandry methods 
that maintain wild-type lifespans could help 
avoid declines in lifespan in the laboratory. 
More indirectly, genetic variation among 
strains can cause them to respond differently 
to mutations and to lab conditions. It is impor- 
tant to understand whether this genetic varia- 
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tion is essentially normal, and if so, whether it is 
part of natural ageing processes, or instead the 
result of some laboratory artefact. For instance, 
in most strains of yeast, over-expression of the 
SIR2 gene extends lifespan, but one strain is 
completely unresponsive”, for reasons that are 
not understood. 

In other cases the interpretation is clearer. A 
meta-analysis of studies of Drosophila in which 
genes encoding antioxidants had been over- 
expressed showed a clear pattern: the lifespan- 
extending effects were greatest in experiments 
with the shortest-lived control strains, with no 
effect seen with the longest-lived controls". 
These data suggest that over-expression of the 
antioxidant gene restored normal lifespan in 
strains whose lifespan had been shortened by 
laboratory culture. Similarly, dietary restric- 
tion can produce substantial increases in mean 
and maximum lifespan in laboratory-adapted 
rodents. But a recent study found no increase 
in mean lifespan in wild-derived mice, which 
had longer lifespans and lower food intake 
— although there was a small increase in max- 
imum lifespan and a decreased incidence of 
tumours”. Disentangling the effects of natural 
genetic variation from the effects of laboratory 
culture is an important task for the future. 


External variation 

Environmental sources of variation are equally 
problematic. Laboratory environments are 
in some ways more benign and in others 
more dangerous than those in nature. Study 
conditions can cause the outcome to vary, 
and understanding this source of variation 
is hugely important for applying results from 
model organisms to humans. For example, the 
impact of natural enemies, including patho- 
gens, is greatly reduced in the lab, and there is 
a superabundance of food and little opportu- 
nity for exercise. 

Food can sometimes be harmful. C. elegans is 
generally fed on another model organism, the 
bacterium Escherichia coli. However, in nature 
the worm eats soil microbes, such as slime 
moulds, and E. coli has been shown to be mildly 
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Subjective decisions about age 
at death 


Genetic background can affect 
lifespan 


Reduced lifespan through Unknown 


inbreeding 


Not applicable. Worms are self- 
fertilizing hermaphrodites 


Genetic adaptation to laboratory Unknown Unknown 
conditions 


Laboratory environment 


COMMENTARY 


KEY: fll Moderate problem Ml Significant problem Il Important problem. Weightings reflect our impressions of the extent to which these factors have posed potential or actual problems for existing studies. 


pathogenic to C. elegans. If the worms are fed a 
different bacterium, Bacillus subtilis, which is 
arguably more similar to the natural bacterial 
diet of the worms, their lifespan is extended 
and they respond less strongly to life-extend- 
ing mutations”. At least some of the increase 
in lifespan seen following dietary restriction 
in worms could therefore result from reduced 
exposure to food pathogens. Some laboratories 
use ‘disabled’ bacteria that are alive but cannot 
divide, to avoid this complication. 

Reproductive behaviour can also affect 
lifespan. In Drosophila, the presence of males 
can greatly shorten the lifespan of females and 
vice versa. It is thus important that any mat- 
ing regime is standardized when measuring 
lifespan, unless mating effects are specifically 
being studied, and the simplest way to achieve 
this is to work with the sexes separately. 

Given these many factors, we recommend 
that environmental variables are clearly speci- 
fied when describing methods, and wherever 
possible, procedures should be standardized, 
both during the rearing of strains and the 
measurement of their lifespan. We should also 
be aware that variables that seem unimportant 
to researchers may still affect the experimental 
subjects. For example, mice are highly sensitive 
to smell and to noise, which will vary between 
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laboratories and are difficult to control. 

Ultimately, the findings of ageing research 
need confirmation in humans. The main goal 
of studies with model organisms is to generate 
hypotheses about the mechanisms of human 
ageing. Humans in many industrialized socie- 
ties have some similarities to laboratory model 
organisms. They are largely freed from the bur- 
den of infectious diseases, they are surrounded 
by a superabundance of food, 
many of them take little exer- 
cise and they voluntarily restrict 
their reproductive rate. Unlike 
laboratory model organisms, 
they have not yet undergone many generations 
of adaptation to this regime. So, rather than 
studying only lab-adapted strains, wild strains 
of model organisms examined under laboratory 
conditions may provide one relevant compari- 
son to human ageing. For mice, which are likely 
to undergo considerable stress during adapta- 
tion, work under controlled, but semi-natural, 
conditions could be revealing. 


Working on humans 
Work has also started in human populations 
on genetic associations with lifespan, age- 
ing-related diseases and other late-life traits. 
Studying the genetic basis of lifespan in 
humans presents some peculiar challenges 
beyond those normally associated with deter- 
mining the genetic basis of complex traits. The 
characteristics of older people, by definition, 
do not become apparent until they are old. By 
that time, many of the obvious control groups 
(siblings, spouses) are no longer available for 
study. The accumulation of environmental 
exposures during a long life — including the 
pre-natal environment — could be important, 
but unknown, factors in determining late-life 
health and survival. The genetic composition 
of populations in specific geographic areas can 
also change with time through immigration, 
resulting in a mixture of sub-populations with 
multiple genetic differences, making careful 
choice of control groups essential. 

Studies of twins, combined with long-term 
studies of the health of individuals, offer 
one way of circumventing some of these 
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“Experimental findings 
are only as good as the 
experimental design.” 


difficulties. They have shown that genetic influ- 
ence on mortality before the age of 60 is small 
and increases after that age, with genetic differ- 
ences accounting for about 25% of the variation 
in lifespan“. This natural genetic variation is 
comparable to what has been observed in 
model organisms. But we know that in some 
animal studies we can get a doubling of lifespan 
from a mutation in a single gene, and there may 
therefore be untapped potential 
for modifying healthy lifespan 
in humans. 

As the research field of 
genetic effects on ageing and 
lifespan starts to mature, the pitfalls and their 
remedies are becoming apparent. Aside from 
the need to make new discoveries, the key 
future challenges are to understand sources 
of variation, deliver robust and repeatable 
findings and to make the studies of model 
organisms as relevant as possible to humans. 
Understanding sources of variation is challeng- 
ing, but once understood in context, they will 
enrich our knowledge of the complex process 
of ageing. a 
Linda Partridge and David Gems are at the Centre 
for Research on Ageing, Department of Biology, 
University College London, The Darwin Building, 
Gower Street, London WCIE 6BT, UK. 
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The twentieth century in a nutshell 


Asad and salutary tale of success, commerce, hubris, razzmatazz and scientific heroism. 


American Chestnut: The Life, Death, and 
Rebirth of a Perfect Tree 

by Susan Freinkel 

University of California Press: 2007. 

294 pp. $27.50, £16.95 


Colin Tudge 
Prominent among the many riches that suc- 
cessive waves of human beings discovered 
in North America was Castanea dentata, the 
American chestnut. This tree could be relied 
on to produce an enormous crop of edible nuts 
every year, unlike the oak and beech. It 
was also huge — more than 5 metres in 
diameter — and, although not as strong 
as oak or as pretty as walnut, it supplied 
timber for anything from telegraph 
poles to coffins and even, at a pinch, for 
pianos. The tannin in the wood stopped 
it rotting or could be extracted to treat 
leather, leaving fibre for making paper. 
The American chestnut grew abun- 
dantly. It was said that a squirrel could 
jump from chestnut to chestnut with- 
out touching the ground, all the way 
from Georgia to Maine. And, as Susan 
Freinkel remarks in American Chestnut, 
it “would pass over 1,094 places along 
the way with ‘chestnut’ in their names”. 
In the Appalachian mountains, the tree’s 
main stronghold, it supported an entire 
economy and culture. People ate the nuts 
and let their pigs and cattle loose to feed 
on them. They sent trains full of nuts and 
timber to the eastern cities. Ten million 
wild turkeys gorged on the Appalachian 
chestnuts. And the trees supported the 
now-extinct passenger pigeon — so 
numerous in the late nineteenth century 


The first signs of disease appeared in 1904 
in what is now the Bronx Zoo: dying leaves, 
then canker, then death. The Bordeaux fungi- 
cide mixture that had worked so well in French 
vineyards was of no use. By 1908 the disease 
was out of hand, and by 1911 it had spread 
to more than ten states. One of these, Penn- 
sylvania, created a ‘firewall’ by destroying all 
of its chestnuts in an unsuccessful attempt to 
contain the disease, spending $275,000 (about 
$5 million in today’s money) and inflicting 
much misery. They may even have signed 


The American chestnut was wiped out by fungus. 


infecting the blight fungus with a virus (dis- 
covered by chance in Italy) that greatly reduces 
its vigour, so that even American trees recover 
from its attacks. Blight-ridden American trees 
have been saved by infecting the active fungus 
with virus-ridden fungus. The third approach 
is to use genetic engineering to introduce genes 
for blight resistance — including synthetic 
ones. This is difficult because chestnuts — in 
contrast to, say, poplars — grow poorly in cul- 
ture. Like the giant panda, these trees seem- 
ingly resist the efforts of conservationists. 

The story of the American chestnut 
encapsulates the history of the twenti- 
eth century. We began the century with 
a tree that could do everything, and all 
we had to do was to treat it with respect. 
Instead, the entrepreneurs undertook an 
exercise in hubris, trying to improve on 
the unimprovable with sublime disre- 
gard for the complexity of nature. Then 
came the political razzmatazz: much 
posturing and rhetoric, and significant 
consignments of public money — all well 
intended but, in the end, horribly mis- 
guided. It would have been better to have 
done nothing (which is difficult for poli- 
ticians). Then there have been decades of 
scientific enterprise by heroic individu- 
als, some of whom sacrificed careers and 
income for chestnut breeding. 

The result? To celebrate Arbor Day in 
2005, President George W. Bush planted 
a hybrid chestnut outside the White 
House that was 75% American; it may 
have enough resistance to fend off blight 
but probably not the genetic wherewithal 
to grow to American size. The president 
told us that planting trees “is good for the 


that single flocks took several hours to 
pass overhead. 

Unfortunately, European Americans from 
the early nineteenth century onwards have 
tried to improve on the native chestnut. They 
introduced other species of Castanea that had 
bigger and fleshier nuts. Former US President 
Thomas Jefferson favoured the European 
species; others went for Asian types. And 
with the Asian trees came the blight. These 
trees were resistant, but the American species 
was not. The fungus was originally identified 
as the genus Cytospora, then reascribed to 
Diaporthe, then to Endothia. In 1978 it wound 
up in Cryphonectria, where it remains as 
Cryphonectria parasitica. 


the American chestnut’s death warrant by 
wiping out those trees that might have founded 
a resistant generation. By the end of the 1920s, 
the wild trees had all but gone. 

Ever since, various enthusiasts and profes- 
sional institutions have been trying to stage a 
chestnut come-back by means of three strate- 
gies. One is conventional breeding — cross- 
ing native American and resistant Asian trees 
to combine the best of both, or backcrossing 
resistant hybrids with pure Americans to 
produce second-generation hybrids that are 
75% American and 25% Asian, hoping that 
the Asian contribution includes the genes for 
resistance. Hypovirulence is another approach, 
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economy and good for the environment”. 
Latest reports indicate that the White House 
tree is not thriving. 
American Chestnut is a parable for our time: 
a sad and salutary tale, beautifully told by US 
science journalist Susan Freinkel. Parables lend 
themselves to different interpretations. Frein- 
kel says, “The American chestnut, successfully 
restored, would confirm that we have the power 
to make things right.” A potentially dangerous 
conclusion, as only with large slices of luck do 
we get away with our excesses. The lesson to be 
learned from this majestic tree, I suggest, is that 
we should aim to leave well alone. a 
Colin Tudge is the author of The Secret Life of Trees 
and Feeding People is Easy. 
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Dover trial documentary screens 


produced by NOVA & Vulcan Productions 
for PBS 
broadcast on 13 November on PBS 


Conflict between religion and science has rarely 
been of more concern. Whereas the rhetoric of 
Richard Dawkins, Sam Harris and others has 
little measurable effect, the outcome of a jury- 
less trial in a two-bit Pennsylvania town in 2005 
had a profound impact on how science is taught 
throughout the United States, and beyond. The 
parents of 11 pupils at the only high school in 
Dover launched a legal challenge to prevent 
the teaching of intelligent design as an alter- 
native to evolution by natural selection. There 
followed thefts, fires, death threats, a media 
sensation and a robust verdict. 

Hot on the heels of several books chroni- 
cling Kitzmiller vs Dover, comes Judgment 
Day, a rigorous television documentary from 
the producers of the prestigious science series 
Nova. This two-hour montage of interviews 
and reconstructions, to be shown on the Public 
Broadcasting Service (PBS) in the United States, 
features all the main players, bar one. Michael 
Behe, inventor of the specious meme “irreduc- 
ible complexity” and guiding light of the intel- 
ligent-design movement, refused to participate. 
His testimony — the cornerstone of the defence 
— revealed a definition of science so loose that 
it includes astrology. 

Herein lies the dramatic challenge of retell- 
ing this important story. The feebleness of the 
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intelligent-design case, and the overwhelming 
strength of the prosecution in systematically 
deconstructing it, render the verdict clear just 
minutes into the programme. The makers of 
Judgement Day inject tension with eyewit- 
ness accounts from the people of Dover, and 
home-video footage of raucous school board 
meetings shows how passionate and divided 
this small community became. It works: it 
is inspiring to hear parents and educators, 
such as Sunday school and physics teacher 
Bryan Rehm, recount how they refused to be 
steam-rollered into bringing religion into the 
science classroom. 

Judgment Day gracefully avoids ridiculing 
intelligent design for the pseudo-intellectual 
fundamentalist fig-leaf that it is, by simply 
showing how the protagonists shot themselves 
in the foot. They plead for the teaching of 
“alternative theories” to strengthen children’s 
education (the misguided sentiment picked 
up by President Bush). But subpoenaed drafts 
of a textbook that promoted intelligent design 
reveal that the word ‘creationists’ was simply 
replaced with ‘design proponents. In one 
instance, this alteration was made so hastily 
it caused the misprint ‘cdesign proponentsists, 
satirized by the prosecution as the transitional 
verbal fossil linking creationism to intelligent 
design. 

At times in this overlong show, one feels 
almost sorry for the intelligent-design team, 
they're so inept. And then you remember that 
its champions take comments from scientists 
out of context and even lied under oath. 
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The judge at the centre of the dispute, 
John E. Jones III, is the hero of the piece. When 
this republican lutheran, appointed by the com- 
mander-in-chief himself, was assigned to the 
case, the pro-evolution lobby feared they had 
been dealt an unsympathetic ear. Happily, the 
measured, dry-witted Jones was fascinated by 
the comprehensive scientific case for darwinian 
evolution. He handed down a damning judg- 
ment that intelligent design is not science, and 
that its teaching is a violation of the cherished 
First Amendment. As a result, Time magazine 
rightly put him in their 2006 list of the world’s 
100 most influential people. 

Intelligent design has not gone away. Next 
February, cinemas will be showing the pro- 
intelligent-design film Expelled: No Intelli- 
gence Allowed, written by comedian Ben Stein. 
Richard Dawkins, prominent anti-creationist 
blogger P. Z. Myers, and others claim the pro- 
ducers duped them into appearing. 

But the Kitzmiller vs Dover verdict, matched 
this September with the outlawing of intelli- 
gent design in the UK national curriculum, 
marked the official neutering of this unpleas- 
ant, sneaky movement in much of the west- 
ern world. Judgment Day is just the sort of 
thoughtful programming that celebrates 
how sensible people — faithful and other- 
wise — can use science and reason to combat 
fundamentalism. 

Adam Rutherford is podcast producer for Nature. 


For more information on the documentary, see 
www.pbs.org/wgbh/nova/id/ 
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Escherichia coli bacteria and the skull of Phineas Gage, whose injury profoundly altered his personality: do model 
organisms, case histories and computer simulations have interesting similarites? 


Comparing modes of inquiry 


edited by Angela N. H. Creager, 
Elizabeth Lunbeck & M. Norton Wise 
Duke University Press: 2007. 287 pp. 
$22.95 


What do fruitflies, computer simulations and 
the daydreams of a psychoanalysis patient have 
in common? A lot, according to the editors of 
Science Without Laws. This essay collection 
contends that there is telling overlap between 
model organisms, mathematical models and 
exemplary narratives — such as the accounts 
of patients beloved by neurologist Oliver Sacks. 
As the economist and historian Mary Morgan 
argues, all these model systems that scientists 
use to explore the world are themselves objects 
to be inquired into, as well as being objects 
with which to inquire. So much so, indeed, 
that philosophers have begun to discover and 
catalogue them, as in this book. Model systems 
are, in effect, the species in the tangled bank 
of science today. 

Science Without Laws has three sections: 
biology, simulations and the human sciences. 
Each chapter analyses an exemplar and illus- 
trates its value as a model. The word ‘model’ 
is construed much more broadly here than in 
other similar books, such as Ronald Giere’s 
Science Without Laws (University of Chicago 
Press) or Mary Morgan and Margaret Mor- 
rison’s Models as Mediators: Perspectives on 


Natural and Social Science (Cambridge Univer- 
sity Press) both published in 1999. 

The collection thus reflects on science’s 
elaborately constructed descriptors and test 
beds, and tries to discern their dependence 
on each other. Morgan’s superb chapter on the 
prisoner's dilemma in game theory illustrates 
how a model can, in symbiosis with economic 
narratives, shake the theoretical principles of 
a discipline — muchas an invasive species can 
disrupt an ecosystem. In an otherwise very 
nice chapter, Josiah Ober attempts to find 
analogies between Athenian democracy and 
laboratory mice. Not surprisingly, this is more 
ofa stretch. 

With the exception of the book’s begin- 
ning and end, there is little cross-fertilization 
between chapters. This lack of exchange is a 
pity, as we shall probably learn the most about 
how we know by creating a comparative natu- 
ral history of modes of inquiry. An opportunity 
is missed, for instance, for interplay among the 
chapters on model organisms in biology and 
on simulations in geology and atmospheric 
science. Naomi Oreskes and Amy Dahan Dal- 
medico emphasize how scientific practices 
in atmospheric science shape and are shaped 
by scientists interacting with each other and 
with their patrons. Similarly, the study of the 
nematode worm Caenorhabditis elegans, the 
plant Arabidopsis, the bacterium Escherichia 
coli and the like have led to huge changes in the 
workings of biologists and in the expectations 
of their governments and corporate financiers. 


©2007 Nature Publishing Group 


\ 
tf 


But the fact that modes of inquiry evolve in 
a sociological theatre is largely absent in the 
essays on flies and worms. 

There is also a gulf between the biology and 
simulations sections and the one on the human 
sciences. These last chapters weaken the oppor- 
tunity for a theme that could unify Science 
Without Laws. Discussing worms, Jane Albert 
Hubbard makes the point that the genealogi- 
cal structure of life allows model organisms to 
act both as individually interesting cases and 
as sources for general inferences and implica- 
tions. In contrast, the chapters that conclude 
the book suggest that such a flexible role is 
more difficult for the case studies of psycho- 
analysis and history. In her afterword, Morgan 
is too generous in seeing a hint of exemplarity 
in John Forrester’s detailed account of the tri- 
als of a psychoanalytical patient and in Carlo 
Ginzburg’s pocket history of a failed Dutch 
merchant. 

Taxonomists are either lumpers or split- 
ters. The authors of this book are lumpers: 
they seek commonalities. But the diversity 
of science also requires judicious splitting. 
Our experience of trying to use the ways of a 
single scientific discipline, such as physics, as 
a template for correct thinking suggests that 
forcing too much epistemological common 
ground can be unwise. 

Carlos Martinez del Rio is professor of zoology 
and physiology at the University of Wyoming, 
Laramie, Wyoming 82071, USA. His latest book is 
Physiological Ecology. 
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Dance of the forest 


Biome 

choreographed byJodi Lamask 
premieres on 10 November in La 
Gran Via, San Salvador 


Michael Hopkin 

To a literalist, dance is a celebration of the 
human body. Of course, it can also allude 
to other organisms — even, in the case of 
San Francisco-based choreographer Jodi 
Lomask, to the relationships between them. 
Biome, a new work by Lomask’s company 
Capacitor, has its premiere in San Salvador 
this week. Like her previous pieces, it began 
with research scientists — here including 
the tropical-tree ecologist Nalini Nadkarni 
— sharing their insights. 

The result, Lomask stresses, does not attempt 
to convey scientific concepts; it uses the ideas as 
a jumping-off point. Audience members pre- 
occupied with looking for metaphors will find 
themselves frustrated. The goal, says Lomask, 
is to “draw attention to open spaces and the 
importance of pristine biomes”. 


EXHIBITION 


Ecological phenomena are strongly refer- 
enced in the show’s preview film, shot in the 
Monteverde cloud forest of Costa Rica and 
screened at this year’s meeting of the Ecologi- 
cal Society of America in San Jose, California. 
A naked dancer unfurls herself towards the 
Sun, capturing with uncanny precision the 
jerky grace of a plant in a time-lapse movie. 


Demonic deeds in symbiotic art 


Colin Martin 
“Science and natural history 
museums have increasingly become 
places of theatrical spectacle 
— venues for display in which 
negotiations for the meanings of 
objects take place,” says Bergit 
Arends, curator of contemporary 
arts at London's Natural History 
Museum. The museum's spectacular 
hall, designed by Victorian architect 
Alfred Waterhouse, has certainly 
provided the public with its fair 
share of spatial drama since it 
opened in 1881. Now the evolving 
function of the museum is marked 
by a resident artist's exhibition. 
Tessa Farmer got her inspiration 
for Little Savages from insects of the 
order Hymenoptera, which includes 
ants, bees, wasps and sawlflies. 
She was captivated by the parasitic 
wasps, perceiving their similarity 
with the malevolent fairy creatures 
she has assembled from insect 
fragments over the past decade. 
Her exhibition has three parts: a 
sculptural intervention based ona 
stuffed fox; six beautifully observed 
pencil drawings of entomological 
subjects; and an animated film, 
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An Insidious Intrusion, in which 
demonic insect-fairies spear an 
unfortunate stag beetle with 
porcupine spines before sawing it 
into lifeless fragments. 

Attacked from all sides, the fox 
has on its shoulders a bird grasping 
an insect in its beak, and what 
seems to be a crustacean clinging 
to its lower back. Swarms of insects 
hover around it: its ears are filled 
with larvae, pupae or wasp nests 


(pictured). 
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hang from its abdomen, beetles 
infest its flank, and its matted brush 
resembles a wasp's nest made out 
of wool. Ona finer scale, a parasitic 
fairy forces a ruby-tailed wasp 

to lay its eggs in the fox's nose, 
while another with similar intent 
attempts to abseil into its mouth 


“Looking at parts of the collection 
with Tessa has made me think 
of it more as a treasure trove,” 


Another creeps, vine-like, up a huge tree; 
yet more lithe bodies cram themselves 
into a hollow trunk, like a colony of fun- 
gal parasites. “Both artists and scientists 
are working to show what's really there,” 
says Lomask. 

Engaging with the biological community 
is a change of direction for Capacitor. Her 
previous shows Digging in the Dark and 
Within Outer Spaces were inspired by geol- 
ogy and astrophysics. With performances 
at universities, theatres, nightclubs, schools, 
corporate events and fringe festivals, the 
company has reached a large number of 
people. The new show will add a shopping 
mall in Central America to that list. 
Capacitor’s next piece, called Urban Cano- 
pies, premieres in December 2008 at the open- 
ing of the new California Academy of Sciences 
building in Golden Gate Park. a 
Michael Hopkin is a senior news reporter 
for Nature. 


View the preview at www.capacitor.org 


says Gavin Broad, curator of 
Hymenoptera, in an endearingly 
frank account of his obsessive 
working life as an entomological 
taxonomist, “Tessa’s fairies seem 
just as real as ‘my’ wasps.” 

Farmer's work imaginatively 
reinforces how the museum's 
entomological fieldwork has 
evolved from its nineteenth-century 
preoccupation with collecting insect 
specimens for pinning or pickling 
— it now has 28 million — into 
an institution that is research- 
driven and focused on how insect 
communities interact ecologically. 

“In the institutional ecology of the 
museum, artists and scientists can 
develop a relationship of mutual 
benefit,” concludes Arends. “The 
role of artists is to disrupt engrained 
perceptions for the benefit of the 
museum, to change its course, and 
to reveal new knowledge in the 
process.” | 
Colin Martin is a London-based 
science writer. 


Little Savages is at the Natural 
History Museum in London until 
27 January 2008. 
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ESSAY 


Time to pick the fly's brain 


Drosophila transformed developmental genetics and cell biology. Now the fruitfly is poised to 
help biologists decipher how the brain works. 


Claude Desplan 


One afternoon in 1997, a colleague called 
me into his office to announce that Dro- 
sophila research was all washed up. The 
fruitfly had amassed fantastic successes as 
a model system for developmental biology. 
In less than a decade, biologists had used it 
to map the genetic and molecular network 
that governs organism development. Now 
my colleague thought that, having served 
its purpose, Drosophila must be about 
to reclaim its status as the odd little crea- 
ture obscure scientists used to manipu- 
late genetic characters. How 
wrong he was. 

In the early 1980s, basic 
research exploded. 
Working on Drosoph- 
ila was valued as 
a way to under- 
stand phenom- 
ena believed to be 
unique to flies, such as 
weird body deformations. 
If the Drosophila pio- 
neers believed that 
their work might have 
broader relevance, they could not have 
dreamed of its awesome implications. 

A huge breakthrough came in 1984 
with the discovery of the homeobox, the 
Rosetta Stone of developmental biology. 
This piece of DNA is shared by the genes 
that govern body pattern. Mutations in 
these ‘homeotic’ genes cause body parts 
to transform into one another — a leg 
grows where an antenna should be, four 
wings develop instead of two, and so on. 
This homeobox sequence was soon found 
to have similar functions in mice, humans 
and indeed most animals. 

The discovery transformed vertebrate 
developmental biology. Biologists identi- 
fied the molecules underlying what had 
been abstract concepts, such as morpho- 
genetic gradients — rising or falling 
amounts of a single protein that direct the 
development of different body parts. To 
the surprise of most, it became clear that 
Drosophila use the same developmental 
genes as vertebrates. For more than ten 
years, there was formidable excitement as 
each new fly gene clarified how a mouse 
embryo develops, an organ forms or what 
causes a human mutation. 

Things went so fast that by the late 
1990s it seemed that the fruitfly had 
reached its full potential for answering 


big questions, and that developmental 
biology should be left to those working 
on vertebrates. By then, fly scientists had 
made great advances that relied more on 
clever tricks and a century of genetics than 
on expensive equipment. A few drosophil- 
ists defected to plants or zebrafish in an 
attempt to adapt these techniques, but the 
fly stalwarts kept going. 

By the start of the new millennium, 
immensely powerful methods, such as 
the ability to generate a single mutant cell 

in an otherwise normal organism, 
helped solve the major challenge of 
the time: understanding how cells 
communicate. 


Most molecular cascades that bring a 
signal from the cell surface to the nucleus 
where it can change the fate of a cell 
were discovered in flies. Their function 
was then further analysed in cultured 
mammalian cells. 

In fact, the entire field of cell biology 
benefited tremendously from work in 
Drosophila. Logic says processes that con- 
cern single cells should be probed in yeast 
or cultured mammalian cells. But the fly 
oocyte and embryo proved to be ideal 
test beds for studying events of the single 
cell. A purely mechanistic description of 
molecular interactions was superseded 
by an understanding of how coordi- 
nated events take place in the context of a 
whole organism. 

Immunity is another area in which 
Drosophila had a surprisingly big role. 
Findings in flies uncovered the molecular 
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pathways that recognize and then fight 
bacterial, fungal or viral infections with- 
out relying on previous encounters with 
the pathogen. The same molecules and 
pathways were later shown to have similar 
roles in humans, which led to the revival of 
innate immunity, a field with far-reaching 
applications in medicine. 

What can Drosophila still deliver that 
will change biology again? The time is ripe 
to use these tiny creatures to understand 
the biggest challenge in biology: how the 
brain works. Flies have a relatively simple 
brain that controls sophisticated behav- 
iours and can be analysed with machines as 
bizarre asa fly flight simulator. Drosophila 
research promises to solve how complex 

neural circuits in the brain mediate behav- 
iour (see page 193), now that researchers 
can manipulate single neurons and use 
sophisticated imaging of the working 
brain and electrophysiologi- 
cal techniques. 
This issue of Nature 
reports the whole- 
genome sequences of 12 
species of Drosophila, 
which will allow the 
organism that can 
be manipulated so 
exquisitely to have 
the best annotated 
genome. Compar- 
ing the genomes of 
closely and distantly related spe- 
cies will highlight which parts of a 
protein, or regions of DNA, have 
been conserved during evolution, and 
so are likely to have a function. This will 
offer a way to decipher the grammar of 
regulatory DNA, which has so far proved 
elusive. 

Powerful genetic tools, genomic 
advances and beautiful imaging of the 
tiny structure of a fly brain or embry- 
onic tissue have kept fruitfly research 
booming for more than two decades. By 
enabling us to pose biological questions in 
an in vivo context, the fly is an ideal subject 
for integrating molecular or cellular proc- 
esses in the biology of the whole organism. 
Drosophila research is thriving and should 
live up to our hopes for many decades 
to come. a 
Claude Desplan is in the Department of 
Biology at New York University, 1009 Silver 
Center, 100 Washington Square East, New 
York, New York 10003, USA. 
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“SCIENCE & POLITICS 


A timely harvest 


The public should be consulted on contentious research and development early enough for their 
opinions to influence the course of science and policy-making. 


Pierre-Benoit Joly and Arie Rip 


Public engagement in emerging science 
and technology is thriving, particularly 
in the United Kingdom. Recent initiatives 
such as ‘Nanodialogues, organized by the 
think-tank Demos, suggest that citizen 
juries, dialogue exercises and interactive 
public understanding projects can be 
fruitful for scientists and members of the 
public. Over two years, the Nanodialogues 
series allowed members of the public to 
join scientists in discussions on regulation, 
research funding, development and cor- 
porate innovation of nanotechnologies. 
Such enterprises may foster mutual 
understanding, but they can strug- 
gle to make a difference to research 
or to policy-making. 

Governments and research insti- 
tutions generally fail to respond to 
the outcomes of public engagement 
exercises, perhaps because the outcomes 
are often too late and too vague on con- 
crete strategies to move forward. We've 
learnt that it is better to engage the public 
‘midstream, at a point in the 
research process when it 
is possible to incorporate 
their opinions into research 
orientation and policy- 
making. 

The French National 
Institute of Agronomic 
Research (INRA) used such 
an approach to focus on 
research into and field trials 
of genetically modified vines. 
In 2001, INRA had to decide 
whether to run field trials of a geneti- 
cally modified vine that is potentially 
resistant to a disease-causing virus. INRAS 
research director for plant sciences, Guy 
Riba, voiced the opinion of most research- 
ers: “Surely scientists have a responsibility 
to carry out these experiments with a view 
to the future, even in the face of current 
public opposition?” 

INRA met strong opposition to the tri- 
als because of the cultural significance of 
wine in France. A group of wine producers, 
including some prestigious chateaux, had 
signed a petition in June 2000 calling for a 
moratorium on the use of genetic modifi- 
cation techniques in wine production, and 
joined forces to create the non-governmen- 
tal organization Terre et Vin du Monde 
(Land and Wine of the World). 

In response, INRA asked a group of 
social scientists who specialize in science 


and technology studies to organize a pub- 
lic consultation, in which we took leading 
roles. Our goal was to produce a public 
report to be taken into account in decision- 
making at INRA. 

Our working group comprised 14 peo- 
ple, including members of the public, wine 
growers and researchers. It had seven 
days of intensive discussions over a six- 
month period in 2002. The set of recom- 
mendations it produced was made freely 


available on the web. The INRA 
directorate prepared a public response 
explaining the decisions it intended to 
make and how these would accommodate 
the group’s recommendations. 

One outcome of the discussions was the 
creation of a local steering committee to 
follow up and give feedback on the field 
experiments taking place in Colmar, a 
town in the Alsace region of France. This 
committee has since grown into a forum 
for debate on various research options to 
fight vine viruses. 

The experiment was highly produc- 
tive. It yielded some unexpected recom- 
mendations that could be worked into the 
decision-making process. Some of the par- 
ticipants opposed the field trial at all costs, 
but most supported it under strict condi- 
tions, including: that INRA guaranteed that 
the trials would be used only for research, 
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not for commercial purposes; that a local 
committee would be in charge of monitor- 
ing the experiment; and that INRA would 
commit to exploring alternative ways to 
fight viruses. Appropriately, it was not a 
smooth process, either during deliberation 
within the group, or in implementing the 
agreement. 

Researchers at INRA criticized the 
public consultation process for its power 
to reduce the freedom of research. Non- 
governmental organizations 

claimed that INRA was manipu- 
lating public opinion through 
the exercise. These tensions 
are an unavoidable part of the 
process. 

Three important lessons 
emerged from the exercise. 
First, midstream engagement 
is not a recipe for wide social 
agreement and acceptance. 

Rather, it improves the robust- 
ness of decisions by taking into 
account the diversity of world 
views and interests. Second, it 
stimulates institutional learning. 
Third, the process can produce 

research and development options 

not previously considered. This is 
of particular value if directors of 
public research are truly committed 
to generating beneficial sociotech- 
nical innovation. 

Public consultations in science and 
technology should be undertaken at 
a point early enough in the development 
process when it is still feasible to change 
course. The nanotechnology world often 
refers to ‘the lessons to be learned from 
genetic modification — the main one is 
timely, considered public engagement. 
Pierre-Benoit Joly is director of research at 
INRA, 65 Boulevard de Brandebourg, 
F-94205 Ivry, France, and director of the TSV 
(Social and Political Transformations related 
to Life Sciences) research unit. Arie Rip is 
emeritus professor of philosophy of science 
and technology at the University of Twente 
in Enschede, the Netherlands, and leads a 
programme on social and ethical aspects of 
nanotechnology. 
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Correction 

In the Essay ‘Big lessons for a healthy 
future’ (Nature 449, 791-792; 2007) the 
conversion of £45.5 billion should have read 
US$93 billion, not million. 
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QUANTUM INFORMATION 


Reality check 


Liesbeth Venema 


It will be a long experimental haul before the great potential of quantum effects can routinely be exploited 
for technological ends. A sense of practical purpose among researchers will encourage progress. 


When the citizens of Geneva cast their votes 
in the Swiss federal elections on 21 October, 
they could be confident that their ballots 
were safe — thanks to the rules of quantum 
mechanics. The poll results were sent down 
an optical fibre from the counting station to 
a government data centre, and their integrity 
was safeguarded by a quantum encryption key 
transmitted through the same fibre. Such a key 
promises to be 100% secure. It is composed of 
a stream of single photons that each take a ran- 
dom, unpredictable polarization state, and any 
attempts at tampering or eavesdropping will be 
noticed by the sender and receiver. 

Quantum cryptography was high on the 
agenda at a meeting* last month on quantum 
information technology. Whether or not the 
exercise in Geneva was genuinely motivated 
by security concerns, it demonstrates, as a 
first public deployment of quantum crypt- 
ography, that the technique is ready to enter 
the commercial market for data encryption 
(G. Ribordy, id Quantique, Geneva; J. Dubois, 
Senetas, Melbourne). But a recurring theme 
of the meeting was the pressing requirement 
to identify other areas of practical application 
for quantum-information systems. 

Another major prospect is quantum com- 
puting. Quantum computers will not simply 
be faster versions of the computers we have 
today. Rather, they will carry out tasks that are 
hard to tackle with any classical approach: for 
example, factoring large numbers and search- 
ing databases. Algorithms for such tasks have 
been available for more than a decade, and it is 
realizing the hardware that remains the main 
barrier to progress. 

The building-block ofa quantum computer 
is the qubit, a versatile version of the conven- 
tional bit. Like its classical counterpart, a qubit 
has two well-defined levels, ‘0’ and ‘1’ But it 
also has the curious property that it can be in 
both states at the same time, occupying them 
with a certain probability. This phenomenon, 
known as ‘superposition, in principle offers a 
powerful way to perform calculations because 
several logic operations can be carried out 


*QIPC 2007: International Conference on Quantum Informa- 
tion Processing and Communication, 15-19 October 2007, 
Barcelona, Spain. 


simultaneously. Another useful property of 
qubits is that they can be entangled — that is, 
their states can be prepared so that they are 
correlated to each other, even though each is 
unknown. For example, if one qubit happens 
to be in level 0 the other one will occupy level 1, 
and vice versa. The exact outcome becomes 
fixed as soon as either of the two is measured. 

A wide range of qubit designs — based on 
atoms, ions, electrons, photons and even super- 
conducting currents — was highlighted at the 
meeting. In most cases, at least two qubits can 
now be connected so that some sort of logic 
operation can be carried out. 
In one of the most advanced 
approaches, in which qubits 


"The ‘killer application’ 
for quantum information 


Small-scale quantum computers, designed 
to carry out a specific task, could be just a few 
years away. But a take-home message from 
the meeting was that more immediate appli- 
cations of quantum technologies are urgently 
required to keep industrial partners interested 
(T. Spiller, Hewlett-Packard, Bristol). The 
‘killer application’ for quantum information is 
not yet known, and more practical ideas need 
to be generated to kick-start a new market’. It 
is sobering to realize that the inventors of the 
transistor did not foresee the huge integrated- 
circuit industry that would develop; their first 
idea for a useful application 
of transistors was in hearing 
aids. What we need to boot- 


take oer form of ions rapped is not yet known, and strap 5 peaniaeser aaa 
in an electromagnetic field, . . technology, says Spiller, are 
up to eight qubits have been more practical ideas quantum hearing aids. 

entangled with each other. A need to be generated to When it comes to near- 


new experimental develop- 
ment is the construction of a 
so-called Toffoli logic gate with ion qubits (R. 
Blatt, Univ. Innsbruck). Toffoli gates are a famil- 
iar concept in classical computation, but are the 
subject of renewed interest because they may 
offer a solution to error correction, a crucial 
consideration for quantum computers. How- 
ever, they require three inputs and are therefore 
more difficult to realize than the two-qubit logic 
gates demonstrated so far. 

The main problem with qubits is that their 
quantum states are fragile, and quickly leak 
away into the environment. This raises the scal- 
ing issue. Coupling just a few qubits together 
seems feasible. But as an increasing number 
of them are connected, more quantum leaks 
occur, so that information is quickly lost. 
Part of the solution may lie in using photons, 
relatively robust quantum entities, to chan- 
nel quantum information between remote 
qubits, and experimental work is under way to 
construct such optical quantum connections. 
Instead of building a computer of say, 5,000 
qubits, a more realistic goal may be to opti- 
cally connect 1,000 quantum registers of just 
five qubits — one for storage, one for commu- 
nication and three auxiliary qubits to ensure 
fault tolerance (A. Sorensen, Niels Bohr Inst., 
Copenhagen)'. 
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kick-start a new market.” 


future technological appli- 
cations, quantum commu- 
nication, and quantum cryptography in 
particular, seems to be the best bet. The record 
distance over which a quantum key has been 
transmitted, both through an optical fibre 
and through free space, is about 150 km. But 
if quantum-communication technology is to 
be widely developed, it will be necessary to 
improve efficiency. At the meeting there was 
much talk of ‘quantum repeaters’ — pieces of 
hardware that can temporarily store and release 
photons without losing their quantum states, 
and that are seen as essential for the effective 
distribution of quantum information over large 
networks and distances. The experimental 
challenge to construct quantum repeaters is 
probably on a par with the challenge to gener- 
ate practical qubits. So far, quantum memories, 
the basic element of a quantum repeater, have 
been made from ensembles of cold gaseous 
atoms. But a solid-state form will eventually be 
required: atomic ensembles of rare-earth ions, 
inserted in a nonlinear optical waveguide, are 
among the first candidates to be investigated 
(M. Staudt, Univ. Geneva)’. 

The quantum future looks bright, although it 
will take a sustained experimental push before 
basic effects such as entanglement, inherent 
randomness and superposition can be exploited 
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in real devices (practical or otherwise). But 
although quantum mechanics has been one of 
the most successful theories of the past century, 
nobody can confidently claim to understand 
why it works so well; for instance, how two 
entangled particles seem to communicate with 
each other ata distance, without any interaction, 
is beyond anybody's comprehension. There is a 
nagging feeling that we are missing something. 
A quantum-information industry may indeed 


be just around the corner, but its underlying 
principles remain largely mysterious. a 
Liesbeth Venema is a senior editor at Nature. 
e-mail: |. venema@nature.com 
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COMPUTATIONAL BIOLOGY 


Protein predictions 


Eleanor J. Dodson 


Predicting the three-dimensional structure of a protein from its amino-acid 
sequence is a dauntingly complex task. But with colossal computer power 
and knowledge of other structures, it can be done. 


Fifty years have passed since the Nobel- 
prizewinning discovery that the amino-acid 
sequence of a protein determines its three- 
dimensional structure’ — yet computational 
biologists are still unable to predict the shape 
of a protein from its sequence. Given that there 
are many more protein sequences available than 
structures, and that protein shape is crucial for 
understanding cellular and physiological pro- 
cesses, a method for predicting such structures 
is vital. The paper by Qian et al.” (page 259 of 
this issue), in which the structure of a protein 
containing 112 amino acids is accurately pre- 
dicted, thus represents a real breakthrough*. 
The authors’ model was sufficiently accurate to 
act as the starting point in the X-ray structure 
determination of the protein. 

Most structural information on pro- 
teins is derived from X-ray and nuclear 
magnetic resonance (NMR) experiments. 
These have revealed the general character- 
istics of proteins — for example, sequence 
motifs that form secondary structural ele- 
ments such as helices and sheets. Such 
elements are organized to generate the 
overall protein architecture, mainly as a result 
of internal interactions between hydrophobic 
amino-acid side chains buried within the 
structure. 

The shape of a protein corresponds to the 
lowest-energy conformation of that mol- 
ecule and reflects the combined properties of 
the constituent amino acids. Low-energy 
conformations arise when the ‘backbone’ 
peptide chain is tightly packed into second- 
ary structural features. Such packing exploits 
both the hydrogen-bonding of amino-acid 
side chains to each other and the energeti- 
cally favourable, compact patterns that arise 
from hydrophobic interactions. Several crucial 


*This article and the paper concerned’ were published online 
on 14 October 2007. 
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side-chain interactions are almost always 
found in certain structural elements of pro- 
teins, but in general there is no simple correla- 
tion between amino-acid sequence and protein 
structure; quite different sequences can adopt 
very similar folds. 

Once a protein structure is known, it is 
fairly easy to see the atomic interactions that 
underpin it. But it is much harder to take 
an amino-acid sequence and work out the 
optimal interactions that determine how it 
will fold. First, it is necessary to quantify the 
energy contributions from various types of 


Figure 1| Model test. Qian et al.” have developed 
a computational method for predicting the 
three-dimensional structure of a protein from 

its amino-acid sequence. Here, their predicted 
structure (grey) of a protein is overlaid with the 
experimentally determined crystal structure 
(shown in colour) of that protein. The agreement 
between the two is excellent, with the amino-acid 
side chains overlapping particularly well. 
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interaction. The effects of molecular confor- 
mations on these contributions must then be 
assessed. But even a relatively small protein can 
have a bewilderingly large number of possible 
conformations. Although some progress has 
been made towards devising a structural predic- 
tion method, crystallographers have so far had 
no reason to worry about their job security. 

The field was greatly stimulated by a network 
set up in 1994 to provide a critical assessment of 
structure prediction (CASP)’. The main goal of 
the CASP network is “to obtain an in-depth and 
objective assessment of our current abilities and 
inabilities in the area of protein structure pre- 
diction”. Every two years the organizers provide 
the amino-acid sequences of a set of proteins 
for which undisclosed crystal structures exist. 
Modellers are challenged to predict structures 
for the proteins, and these are then assessed 
against the crystallographic results. The asses- 
sors use various scoring systems, but the most 
rigorous test is the one used by Qian et al.’ to test 
their own models — can the prediction be used 
in ‘molecular replacement’ searches’ that allow 
the raw data from X-ray diffraction studies to be 
related to the structure of the compound being 
investigated? Normally, a previously determined 
structure ofa protein with a similar amino-acid 
sequence is used for this purpose. 

Qian and colleagues’ models’ passed 
the molecular-replacement test with fly- 
ing colours (Fig. 1): one of the authors’ 
ab initio predictions was used successfully as a 
molecular-replacement model. Furthermore, 
the authors used their method to refine ten 
NMR models of protein structures, yielding 
results that were in better agreement with X-ray 
data than the original models. And finally, they 
were able to improve the molecular-replace- 
ment scores of several models that started from 
protein structures distantly related to that of 
the target protein. This gives the lie to the old 
crystallographers’ adage that computational 
modelling is a time-consuming way to make 
a poor model worse. 

The authors used a program called Rosetta 
to make their structural predictions. The pro- 
gram begins by mapping fragments of the 
sequence under review against existing infor- 
mation from previously determined struc- 
tures, to identify likely structural motifs. It 
then constructs many rough, low-resolution 
models from these fragments and tests them 
against energy criteria (which are dominated 
by hydrophobic interactions). In this way, a 
large set of possible low-energy conformations 
is identified, one of which is likely to be that 
adopted by the protein. 

At the next stage, the energy profile of every 
atom in the protein is incorporated into the 
low-energy models. Rosetta explores a huge 
range of randomly generated side-chain and 
backbone conformations, again calculating 
their effects on molecular energy. The resulting 
energy ‘landscape’ can vary dramatically — for 
example, small shifts of a single atom can make 
or break a hydrogen bond. 
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The whole procedure is iterative — solu- 
tions are repeatedly assessed, the lowest-energy 
models are clustered to identify common fea- 
tures and then further effort is concentrated 
on more variable regions. Rosetta also analyses 
peptide sequences found in analogous proteins 
from species of organisms other than that of the 
target sequence, as such proteins are expected 
to have similar three-dimensional structures 
to the target molecule. The whole process is 
terminated when conformations are identified 
that have significantly lower energies than the 
average energy of the protein. 

The algorithms used in Rosetta are sophis- 
ticated, and the computing resources required 
to carry out the calculations, to keep track of 
results and to plan future strategies, are awe- 
some. The authors therefore used a procedure 
called Rosetta@home’, which distributes the 
calculations across a network of home comput- 
ers — more than 70,000 in June 2006 — whose 
owners allow the program access to their idle 
machines. 

There is still much to be done. Cynics might 
mutter that one success doesn't prove that Qian 
and colleagues’ method’ is truly general, and 


that it should be assessed further using known 
structures. Nevertheless, this approach dem- 
onstrates real progress in several respects: 
the use of enormous computational power; 
the exploitation of known three-dimensional 
structures; the development of powerful search 
algorithms that relate those structures to new 
sequences; and the steadily improving tactics 
used to determine low-energy conformations 
of molecules. The benefits will be seen in 
structure-based drug design and in improved 
models for crystallographic calculations. And 
in the future, this method might provide struc- 
tural information about intractable molecules 
that are difficult to study experimentally. ™ 
Eleanor J. Dodson is at the York Structural Biology 
Laboratory, Department of Chemistry, University 
of York, Heslington, York YO10 5YW, UK. 

e-mail: e.dodson@ysbl.york.ac.uk 
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MATERIALS SCIENCE 


Magnetic blue 


Jeroen van den Brink and Alberto F. Morpurgo 


A commonly used blue dye is more than just a pretty colour. This material 
and its relatives are semiconductors, and their magnetic properties can be 
controlled by engineering their crystal structure. 


Organic compounds are rarely magnetic, but 
metal phthalocyanine (MPc) materials are 
notable exceptions to this rule. Reporting in 
Advanced Materials, Heutz et al.'! now show 
that the magnetism of MPcs can be controlled. 
By changing the crystal structure of an MPc 
film, the authors switched the material from 
being in a magnetically ordered state to a non- 
magnetic one. This approach might provide 
a method for customizing the magnetism of 
molecular materials. 

MPcsare flat molecules that take the shape 
of a four-leaf clover. They consist of an outer 
ring, formed from nitrogen, carbon and hydro- 
gen atoms, with a metal atom bound at the cen- 
tre (Fig. 1). The first molecule of this class was 
discovered at the beginning of the twentieth 
century, and had a copper atom in the middle. 
Because of its brilliant blue colour, the com- 
pound was immediately seized upon for use 
in paints and dyes. The hue also inspired the 
name ‘phthalocyanine, which was taken from 
the Greek-derived words for rock oil (naphtha) 
and blue (cyan). 

Since then, more than 70 MPcs have been 
synthesized, each with a different central atom 
or group of atoms. The properties of these 


compounds vary widely. For example, simply 
attaching chlorine atoms to the aromatic rings 
in copper phthalocyanine (CuPc) modifies 
the electronic absorption spectrum of the mol- 
ecules. This process is used to add subtle green 
tones to blue paint. No great conceptual leap is 
required to see that similar structural modi- 
fications to MPcs could result in compounds 
with other interesting properties. 

In fact, chemists have long known how ver- 
satile MPcs can be. Apart from their common 
use as dyes in the textile and paper industry, 
they can also act as catalysts, and they have 
even been investigated as anticancer agents. But 
perhaps their most interesting characteristics 
are their magnetic and electronic properties. If 
a transition-metal atom is placed in the centre 
of the ring, MPcs carry a magnetic moment 
because of the particle spin of the transition- 
metal atom. The spin value varies depending 
on the metal used, so that MPcs can be thought 
of as nanomagnets, the magnetic strength of 
which can be controlled at a molecular level. 

Physicists are only just starting to explore 
systematically the full potential of MPc-based 
compounds. The leitmotif of this work is the 
addition of electrons to the materials to probe 
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Figure 1| Structure of metal phthalocyanines. 
Metal phthalocyanines are a class of molecule 
that comprises an organic, four-leaf-clover 
structure with a metal at the centre. Heutz et al.' 
show that the magnetic state of films of these 
molecules can be switched by controlling their 
crystalline structure. 


changes in their electrical and magnetic prop- 
erties. Of particular interest is the unexpectedly 
large number of electrons that can be hosted by 
MPcs — up to four or five on a single molecule. 
The resulting charge density can be tuned by 
adding electron-donating atoms (such as lith- 
ium, potassium or rubidium) to the materi- 
als. This ‘electron-doping’ technique has also 
been used on buckminsterfullerene (C,,), the 
famous football-shaped carbon molecule that 
has been a fertile playground for condensed- 
matter physicists for almost two decades. 

The latest experiments on MPcs have pro- 
duced some surprises. These compounds are 
usually semiconductors, but several MPc films 
turn into metallic conductors when electron- 
doped with potassium atoms’. The variation 
of conductance with the amount of potassium 
incorporated into the films provides informa- 
tion about which molecular orbitals the donated 
electrons occupy in each MPc (ref. 3). Other 
experiments reveal that the magnetic properties 
of manganese-containing M Pcs can be tuned by 
varying the concentration of lithium dopants’. 

The big idea behind all this work is that it 
should be possible to engineer the electronic 
properties of solids by chemical actions at a 
molecular level. This proposal is certainly not 
new. But researchers who have attempted this in 
the past have almost invariably been confronted 
with a harsh reality: small molecular modifica- 
tions made to tune the bulk electronic proper- 
ties of a solid often cause drastic changes to the 
packing of the molecules in that solid. Such 
changes to the crystal packing cannot usually 
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be controlled, and can easily eclipse any chemi- 
cally induced changes to the electronic proper- 
ties. In most cases, this has prevented systematic 
progress. MPcs, however, are different from 
other compounds; in most MPc crystals, the 
molecules are packed in a similar way, so that 
structural effects are less likely to be a problem. 
Heutz et al.’ now show that not only do the 
crystal structures of MPcs create fewer prob- 
lems, but they may also be turned to advan- 
tage. The authors controlled the crystal form 
of thin MPc films by growing them on differ- 
ent substrates; this allowed them to exploit the 
fact that the interaction between the magnetic 
moments of neighbouring molecules depends 
on the relative orientation of those molecules. 
On one of the substrates, the orientation of the 
molecules in the film is such that the interaction 
between magnetic moments vanishes. In this 
case, the MPc nanomagnets point in random 
directions and do not create a net magnetic 
moment. But ona different substrate, the mol- 
ecules were slightly rotated with respect to their 
previous orientation. This is enough to switch 
on the magnetic interaction between the mol- 
ecules. In other words, the authors engineered 
the macroscopic ordering of the nanomagnets 
by controlling the crystal form of the material. 
The authors findings’ are the latest in a 
growing body of work that explores the unique 
properties of MPcs. It is now clear that these 
compounds display a rich interplay of elec- 
tronic, magnetic and structural properties, with 
potential technological relevance. Stimulated 
by this experimental activity, theorists have also 
started to investigate electron-doped MPcs, 
and have made some startling predictions. For 
example, they propose that, under appropriate 
conditions, these systems may become super- 
conductors at high temperatures’, adding to the 
already impressive roll-call of MPc properties. 
It is always difficult to predict how any 
field of research will develop. But it is clear that 
multifunctional MPcs are potential building- 
blocks for future materials. We would not be 
surprised if physicists soon start to approach 
their colleagues in chemistry and materials- 
science departments asking for a greater variety of 
well-characterized MPc compounds. At present, 
MPc materials may just be the toys of blue-skies 
researchers. But once they make it out of the 
playroom, they could become invaluable. 
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A twist on periodicity at Saturn 


Margaret Galland Kivelson 


Saturn's nominal rotation period is timed by a ‘radio clock’ that counts 
bursts of emissions controlled by the planet's magnetic field. Buffeting 
by the solar wind may explain the clock's irregularities. 


Einstein showed us that measurements of time 
made in systems moving at different speeds 
will not agree. What would he have thought 
of a clock whose ticking depends on how fast 
the solar wind — a gas of charged particles, or 
plasma, constantly flowing outwards from the 
Sun — is blowing? Yet that is what, on page 265 
of this issue, Zarka and colleagues’ tell us is the 
case for the radio-emission clock that has been 
used to infer Saturn's rotation period. 

Time-keeping on planets is linked to their 
orbital periods (‘years’) and their rotation 
periods (‘days’). Strangely, it is not straightfor- 
ward to determine the rotation periods of the 
gas-giant planets: Jupiter, Saturn, Neptune and 
Uranus. These planets lack solid surfaces with 
features to track as the planet rotates. Images 
of the gas giants allow one to track clouds, 
but their motions are not precisely tied to 
the rotation of the interior. How, then, do we 
establish rotation rates? 

A particularly fruitful approach has been 
to monitor the intensity of radio-frequency 


Magnetosphere 


emissions from sources close to the planet. Such 
emissions arise in a planet's magnetosphere, the 
region of space dominated by its magnetic field 
(Fig. 1), and their intensity depends on the angle 
between the observer and the magnetic field at 
the source. If the planet’s internally generated 
magnetic field is asymmetric about its spin axis, 
the direction of the field at the source will seem 
to nod up and down as the planet spins, and 
the intensity of the observed emissions will vary 
with the rotation period. The planetary rotation 
rates of Jupiter’, Neptune and Uranus have been 
identified in this way. 

Saturn emits radio signals modulated at a 
period of about 10.75 hours. This period has 
been used to define the period of rotation of 
the interior’, but it has proved hard to under- 
stand why the radio power varies periodically 
because the best available measurements fail to 
detect any asymmetry of the internal magnetic 
field*. Possibly even more puzzling is the recog- 
nition that the period of the modulation is not 
fixed. The first hints of that, initially greeted 


Figure 1 | Generation of radio emissions at Saturn. Saturn’s magnetosphere is embedded in the solar 
wind, here shown as flowing away from the Sun at speeds that increase and decrease periodically (the 
faster-flowing portions are depicted in dark orange). Radio emissions are produced where electrical 
current flows into Saturn’s auroral ionosphere. Zarka et al.' show that the power of the radio emissions is 
modulated periodically, and that the period varies with the speed of the solar wind, possibly because the 
currents are generated by wave-like disturbances at shifting locations along the magnetospheric boundary. 
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with some scepticism, came from measure- 
ments made by the Ulysses spacecraft*. More 
recently, observations by the Cassini orbiter 
have confirmed that the period changes by 
as much as 1% or so in months to years*. 

So, today, we know that the radio power is 
periodically modulated but we do not under- 
stand why. And we know that the radio period 
drifts too rapidly to be consistent with changes 
in the rotation period of the deep interior of a 
massive spinning planet. Other types of analy- 
sis give an estimate of the rotation rate of the 
deep interior that is distinctly shorter than 
the shortest period inferred from the radio 
clock’. Thus, it seems likely that the radio clock 
responds to processes in the planet's upper 
atmosphere and magnetosphere. Explanations 
of the varying period of the radio clock have 
appealed to changing conditions that are either 
external to Saturn's magnetosphere (such as the 
speed of the solar wind’) or internal to it (such 
as the mass injected from the vapour plume of 
Saturn’s small moon, Enceladus’). But evidence 
that such effects cause the observed drift in the 
period has been sketchy. 

Zarka et al.' use roughly three years of 
Cassini radio-wave data to provide compel- 
ling support for the hypothesis that external 
effects contribute to the modulation of the 
radio period. They find that the total power 
within a defined range of radio frequencies 
integrated over a full Saturn rotation period 
of 10.75 hours fluctuates on timescales of 
about 20-30 days. The properties of the 
solar wind are known to fluctuate at the solar 
rotation period of 25 days, and also to show 
trends over longer timescales. Zarka et al. find 
that cross-correlations with the speed of the 
solar wind are high, especially when Cassini's 
colatitude (the difference between its latitude 
and 90°) remains relatively constant, relative to 
Saturn's spin axis. The correlations with other 
properties of the solar wind (such as dynamic 
pressure) are weak. 

Given evidence that the source of the radio 
emissions seems to be localized in the morning 
to noon sector’, it was previously proposed’ 
that changes in the period of the radio clock 
would occur if the source location shifts with 
changing solar-wind velocity. Such shifts could 
arise (and vary systematically with solar-wind 
velocity) if the emissions are triggered where 
the magnetospheric boundary becomes 
unstable through the growth of a phenom- 
enon known as Kelvin-Helmholtz waves (the 
equivalent for magnetized plasma of wave- 
breaking when high winds blow over water). 
This interpretation remains speculative: the 
new results do not establish a mechanism for 
the changing periodicity. But the knowledge 
that the radio period is modulated by the speed 
of the solar wind should help in the quest for a 
more complete understanding. 

Many planetary scientists expected Saturn's 
magnetosphere to be a bloated but rather 
boring analogue of Earth’s. The data being 
collected by Cassini continue to belie this 


expectation. Ten years after its launch from 
Earth, the mission continues its fruitful explo- 
ration ofa planetary system that is dramatically 
different from any previously investigated. 
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FISHERIES 


Nets versus nature 


David O. Conover 


The life-histories of pike adjust quickly to shifts in the opposing forces of 
fishing and natural selection. Such rapid changes suggest that evolutionary 
dynamics must be incorporated into fisheries management. 


People like to catch big fish, sometimes so 
much so that fish sizes overall become greatly 
diminished. According to one view, the contin- 
ual removal of large fish from a population sets 
the stage for rapid, undesirable evolutionary 
changes, including slower growth, earlier adult 
maturation and permanently smaller size’. 
This occurs because removing the largest fish 
directly opposes natural selection, which tends 
to favour large size. 

What happens when these two forces simul- 
taneously oppose one another? Can evolution 
respond quickly enough to track changes in 
fishing selection, or does natural selection 
counteract it? Writing in Proceedings of the 
National Academy of Sciences’, Eric Edeline 
and colleagues illustrate the outcome of this 
dynamic tug-of-war between the forces of 
natural selection and fishing selection. 

Until now, the theory underlying the 
management of fisheries has been based on 
ecological models that predict how the pro- 
ductivity of an exploited population changes 
in relation to its density, and the age and size 
at which fish are caught. The goal is to ensure 
a maximal but sustainable catch in perpetu- 
ity. Current management approaches do not 
take into account the potential for evolutionary 
change in response to fishing. 

Why is evolution important to fisheries man- 
agement? It could be argued that fishing merely 
adds an additional predator to the ecosystem. 
But from the fish’s point of view, humans turn 
the rules of engagement completely upside 
down. Most natural predators attack smaller 
fish more frequently than larger fish. The 
bigger a fish gets, the lower its mortality 
(Fig. 1). Hence, growing fast early in life is a 
good strategy. Moreover, because big fish 
produce many more offspring than small 
fish, delaying maturation to larger size also 


©2007 Nature Publishing Group 


Natural mortality 


Selection gradient 


Mortality rate 


Fishing mortality 


Fish size 


Figure 1| The darwinian struggle between 
natural selection and fishing selection. The 
graph depicts the contrast between mortality 
rates as a function of fish size in the absence and 
presence of mortality due to fishing. Natural 
rates of mortality decline dramatically with 
increasing size early in life, until reaching a 
low level for the remainder of life (purple). 
Fishing greatly increases the mortality of large 
fish (green). Arrows represent the direction of 
selection on body size in the absence (purple 
arrow) and presence of fishing (green arrow). 


increases fitness — that is, the likelihood that 
one’s genes will be passed on to future genera- 
tions. By causing greatly increased mortality 
at large sizes, fishing selects for fish that grow 
slowly and mature at small sizes. Numerous 
other physiological, behavioural and repro- 
ductive traits likewise evolve that can lower fit- 
ness‘, Taken to its extreme, many generations 
of intense size-selective fishing could in theory 
cause the evolution of a population of runts. 
The introduction of darwinian principles 
into fisheries science has been controversial”®. 
Some have argued that adequate proof of evolu- 
tionary changes caused by fishing has not been 
demonstrated. That would require changes in 
traits such as growth rate to be shown to have 
a genetic basis. This is extremely difficult to do 
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in the wild because environmental and genetic 
influences are confounded, although new 
statistical methods have enabled the evolution 
of certain traits (such as size at maturity) to 
be revealed’. Lab experiments, in which envi- 
ronmental conditions are standardized, can 
demonstrate genetic change® but have been 
criticized for not representing real fisheries in 
the wild’. 

Edeline et al.* now enter the fray. They took 
advantage of a unique 50-year time series 
of data on growth rates of pike (Esox lucius) 
in Lake Windermere in northwest Eng- 
land. This lake was fished for centuries until 
1921, when the net fisheries were closed. Net 
fishing did not reopen until 1944. Each year 
from 1944 onwards, biologists tracked the 
age and growth of individual pike landed in 
the fishery by measuring the annual rings 
that form in certain bones, much like reading 
the rings in a tree trunk. They also tagged 
and recaptured pike, providing estimates of 
population size and mortality. 

These highly detailed data enabled the 
authors to show in an earlier paper’ that fish- 
ing did indeed remove the larger, faster-grow- 
ing fish whereas natural sources of mortality 
did the opposite. Hence, they hypothesized 
that the sudden resurgence of fishing in 1944 
should cause an evolutionary decline in growth 
rate followed later by an increase as the fishery 
waned over the 50 years. After using statisti- 
cal models to account for the effect of a suite 
of confounding environmental factors, the 
temporal trend in growth rate closely tracked 
the predicted pattern. Several twists and turns 
in growth trajectory seemed to coincide with 
episodes of excessively high fishing and with 
the large-scale death of perch (a prime food 
source for pike) in 1976. In addition, changes 
occurred in the level of reproductive invest- 
ment by young females that were also as pre- 
dicted from evolutionary theory. The authors 
conclude that evolutionary responses to the 
opposing forces of fishing and natural selection 
must be accounted for in managing fisheries. 

Critics will contend that consistency with 
the predictions of evolution is not proof that 
the changes observed were in fact genetic. 
The responses are probably far too rapid to be 
entirely evolutionary as opposed to ecological in 
origin. With only one population under study, 
any interpretation of this sequence of growth 
changes contains an element of story-tell- 
ing. Perhaps the changes in growth rate fit the 
predictions of evolution purely by coincidence. 

Yet this is one of the most data-rich and 
comprehensive analyses of fishery-induced 
evolution ever published. Together with strong 
evidence also emerging from a variety of other 
harvested species”, the likelihood that all 
such studies are erroneous is becoming vanish- 
ingly small. Moreover, Edeline and colleagues’ 
approach provides fresh incentive and the 
methodology to test for evolutionary change in 
the many other long-term data sets of age and 
growth that exist for heavily fished species. m™ 
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DEVELOPMENTAL BIOLOGY 


The power of blood 


Paige Snider and Simon J. Conway 


Compared with the masterpiece crafted by nature, even Leonardo da Vinci's 
anatomical drawings of the cardiovascular system seem primitive. In 
creating this system, nature seems to use blood flow as its paintbrush. 


More than a century ago, Thoma noted" that 
blood vessels carrying a high volume enlarge, 
whereas those with low flow regress. Since then, 
significant evidence has accumulated to sug- 
gest that the mechanical force created by blood 
flow affects gene expression in the developing 
embryo”. But how does blood flow contribute 
to shaping a functional vascular architecture, 
when vessel identity and developmental pat- 
terning are genetically predetermined? On 
page 285 of this issue, Yashiro et al.’ take a step 
towards solving this puzzle by showing that 
an interplay between haemodynamics (the 
dynamics of blood flow) and genetic factors 
mediates cardiovascular development. 

Among the most common congenital birth 
defects are abnormalities in the growth and 
development of the cardiovascular system; in 
particular, anomalies in the asymmetric remod- 
elling of a transient structure known as the 
branchial arch artery apparatus that occurs in 
6-7-week-old human embryos’. Mammals have 
five branchial arches, each of which contains an 
arch artery (numbered 1-4 and 6). Blood flows 
out of the heart via branchial arch arteries to 
circulate throughout the embryo. Although the 
arterial system initially forms symmetrically 
(Fig. 1a), in the mature organism, the left fourth 
and sixth arch arteries persist and give rise to 
the aortic arch and pulmonary trunk, whereas 
the right fourth and sixth arch arteries regress 
(Fig. 1b). This results in the asymmetric develop- 
ment of the great vessels of the ventricular 
outflow tract’, which forms part of the left and 
right ventricles. 

A likely candidate for directing asymmetric 
development was the transcription factor Pitx2, 
which is the product of the only gene known 
to be asymmetrically expressed in the embry- 
onic tissue that generates the branchial arch®”. 
Moreover, Pitx2 is induced by Nodal, a signal- 
ling molecule that regulates the initial establish- 
ment of left-right patterning in the embryo’. 
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But the overall mechanisms — or genetic path- 
ways — that govern asymmetric development 
of the artery arches remained elusive. 

In search of an answer, Yashiro et al.* used 
mutant mice that do not express physiologi- 
cally significant levels of Pitx2 in the left side 
of their ventricular outflow tract. The authors 
found that this mutation prevents normal rota- 
tional movement of the outflow tract, which is 
essential for remodelling of the right sixth arch 
artery into a long, narrow vessel with reduced 
blood flow. Consequently, they observed that 
in about half of the mutant mice the right sixth 
arch artery persists in its initial morphology. 

This finding led Yashiro and colleagues to 
propose that, in normal mice, reduced blood 
flow through the right sixth arch artery leads to 
its regression. To test this hypothesis, they per- 
formed a clever microsurgical procedure to ‘tie 
off’ the left sixth arch artery, thereby reducing 
blood flow through it (Fig. 1c). They reasoned 
that if this treatment causes unexpected regres- 
sion of the left sixth arch artery, then haemo- 
dynamics has a pivotal role in the asymmetric 
development of the arterial system. They found 
that microsurgical ligation of this artery does 
result in its regression, and that the usually 
regressing right sixth arch artery — which 
after ligation receives normal levels of blood 
— persists. 

The authors went on to show that another 
factor that contributes to normal arch-artery 
remodelling is the asymmetric expression of 
vascular growth factors such as PDGF and 
VEGE because inhibition of these factors results 
in the loss of both the left and right sixth arch 
arteries. Remarkably, they find that only the 
sixth arch arteries are sensitive to growth-factor- 
mediated signalling pathways, as the remodel- 
ling of the fourth arch artery was not affected by 
inhibition of these factors’ receptors. 

Although sixth arch arteries depend on Pitx2 
for their normal, asymmetrical remodelling’, 
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Figure 1| Blood flow shapes the great 

arteries. a, The heart of a 10.5-day-old mouse 
embryo (the equivalent of 30-day-old human 
embryo) shows symmetrical development of 
branchial arch arteries (1-4 and 6); at this stage, 
equivalent amounts of blood flow through all 

of these arteries. b, By embryonic day 11.5 (the 
equivalent of day 35 in human embryos), these 
arch arteries have undergone asymmetrical 
remodelling. Consequently, the right fourth 
and sixth arch arteries regress, and the left 
fourth and sixth arch arteries persist to become 
the aortic artery and pulmonary trunk. At this 
stage, blood predominantly flows through the 
left arch arteries; this follows the rotation and 
realignment of the outflow tract (green arrow), 
and is concomitant with increased expression of 
the growth factors PDGF and VEGE. ¢, To test 
whether blood flow determines asymmetrical 
remodelling of the branchial arch arteries, 
Yashiro et al.’ surgically ligated the left arch 
arteries in 11.5-day-old mouse embryos, thereby 
preventing normal blood flow through them. 
This led to abnormal regression of the left sixth 
arch artery and the parallel development of the 
right arteries, into which blood flow was not 
obstructed. The anterior-heart-derived cells* are 
shown in red and blood flow is indicated by pink 
arrows. LA, left atrium; LV, left ventricle; RA, 
right atrium; RV, right ventricle. 


Yashiro et al. found to their surprise that Pitx2 
is not expressed in any of the cells that sur- 
round these arteries. Instead, this transcrip- 
tion factor is known" to be expressed in the 
anterior heart, where it orchestrates rotation 
of the outflow tract*”. Given that the anterior 
heart requires the action of the growth factor 
Fgf8 (ref. 10), and that mouse embryos lacking 
Pitx2 express abnormal levels of Fgf8 (ref. 11), 
it will be interesting to determine whether Fgf8 
is another mediator of arch-artery regression. 

Earlier work has demonstrated that several 
other mouse mutants that show defects in arch- 
artery laterality do not have either a deficiency 
in the Nodal-Pitx2 signalling pathway or 
abnormal rotation of the outflow tract’. To rec- 
oncile these observations with those of Yashiro 
and colleagues, it is necessary to find out what 
other mechanisms underlie abnormal remod- 
elling of the arch arteries and whether there is 
acommon molecular pathway responsible for 
causing these congenital heart defects. 

It also remains to be seen whether the cardio- 
vascular abnormalities in the mutant mice 
studied by Yashiro et al. are due to a direct 
effect of haemodynamics. This is because 
a general reduction in cardiac output and 
blood flow could cause an overall decrease in 
sheer-stress-responsive vascular growth fac- 
tors, which might, in turn, result in the regres- 
sion of certain vessels. Other questions that 
need to be addressed include whether direct 
manipulation of the outflow-tract rotation 
would affect arch-artery remodelling, and why 
the sixth arch artery is particularly sensitive to 
haemodynamics. 

Nonetheless, Yashiro and colleagues’ results 
provide a useful model for converting physi- 
cal forces into genetic information — that is, 
the maintenance by haemodynamics of the 
expression of vessel-stabilization factors (such 
as PDGF and VEGF) that shape the asymmetri- 
cal cardiovascular system of mammals. It is also 
exciting that these researchers have successfully 
manipulated embryonic blood flow — a tech- 
nically challenging task — to carry out genetic 
analysis. a 
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50 YEARS AGO 

On the morning of November 

3, the U.S.S.R. announced from 
Moscow the launching of the 
second artificial Earth satellite 
— about a month after the 
launching, on October 4, of the 
first ... For the first time in history, 
a living mammal from the Earth 
is travelling in an Earth satellite, 
for the second one is carrying a 
dog: at the time of writing it was 
“calm and behaving normally” 
according to Russian reports 

.. The launching of the second 
satellite is staggering enough 

in itself; but some Russian 
scientists have stated that it is 
hoped that the dog will return 
alive. They claim to have solved 
the immensely difficult ‘re-entry’ 
problem; that is, the safe passage 
through the atmosphere in spite 
of the great heat generated 
through friction. This possibility 
will be watched with the keenest 
interest. The greatest peril 
which the animal is facing is 

the absence, or considerable 
reduction in strength, of gravity; 
also it is conceivable that cosmic 
rays at that height may have 

a fatal effect on the nervous 
system. 

From Nature 9 November 1957. 


100 YEARS AGO 

No-one more fully understands 
the danger of indiscriminately 
using a questionnaire than 

Dr. J. G. Frazer, who is 
publishing through the 
Cambridge University Press 
his “Questions on the Customs, 
Beliefs, and Languages of 
Savages” ... They are intended, 
not so much to be put directly 
to the savage, as to indicate to 
the inquirer in the field those 
subjects upon which students 
at home desire information. 
Leading questions should be 
avoided, as they tempt the 
savage to give answers which 
he thinks will be acceptable. 
The savage should be 
encouraged to talk in his usual 
vague way until he has exhausted 
his information for the time, 
when a question judiciously 
asked may jog his memory. 
From Nature 7 November 1907. 
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Wolfgang K. H. Panofsky (1919-2007) 


Physicist, and passionate and influential advocate of arms control. 


Wolfgang Panofsky — ‘Pief’ to his friends 
and colleagues — died of a heart attack on 
24 September at his home in Los Altos, 
California. He was admired worldwide as 

a great physicist and the founding director 
of the Stanford Linear Accelerator Center. 
But beyond that, he earned universal respect 
for his humanity and integrity, and for 

the perseverance with which he fought to 
achieve the goals that he greatly valued. 

Panofsky was born in Berlin in 1919, 
and was raised in Hamburg until 1934, 
when his father, the eminent art historian 
Erwin Panofsky, was dismissed from his 
professorship at the university because he 
was Jewish. Realizing that their lives were 
at risk as well as their careers, the Panofsky 
family sailed to the United States, settling in 
Princeton, New Jersey. The young Panofsky 
entered Princeton University at the age of 15, 
graduating in 1938 with a major in physics, 
and moved on to the California Institute of 
Technology in Pasadena for his graduate 
studies. He received his PhD in 1942 after 
completing his dissertation based on research 
in the laboratory of Jesse W. DuMond, whose 
daughter, Adele, he married the same year. 

Although officially an enemy alien, 
Panofsky was granted clearance to work on 
the atom bomb and other military projects 
during the Second World War. In 1945 he was 
recruited by Luis Alvarez to the Radiation 
Laboratory at the University of California, 
Berkeley, where he remained for the next six 
years. These were fruitful years of research 
in elementary particle physics for Panofsky. 
Working with several colleagues at the 
accelerators at the Radiation Lab, he made 
important measurements of the properties 
of 7 mesons (pions, mediators of the strong 
nuclear force). Most notably, in collaboration 
with Jack Steinberger, he confirmed the 
existence of the elusive neutral pion and 
discovered its decay into two y-rays. 

The early 1950s were a time of security 
witch-hunts and congressional investigations 
into communist subversion in the United 
States. Panofsky objected on principle when a 
loyalty oath was imposed on the faculty at the 
University of California. Although willing to 
sign this oath, which was later invalidated by 
the courts and thrown out, he felt that it made 
the situation there intolerable and moved to 
Stanford University in 1951 as a professor 
of physics. He also assumed directorship of 
Stanford’s High Energy Physics Laboratory, 
which had an electron linear accelerator that 
he developed into a powerful research tool. 

The outstanding productivity of that 
facility stimulated a proposal to builda 
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two-mile linear accelerator. The result was 
the Stanford Linear Accelerator Center 
(SLAC), with Panofsky serving as director 
from its authorization in 1961 until 1984. A 
measure of his leadership and SLAC’s success 
are the three Nobel prizes for discoveries that 
were made there — the quark structure of 
protons and neutrons; the J/y meson whose 
constituents are the charmed quarks; and 

the heavy T lepton. Panofsky also supported 
major advances in accelerator technology, 
including the development and exploitation 
of electron—positron storage rings and 
colliders. 

Asa teacher, Panofsky was renowned 
for his excellent lectures, his patience and 
his accessibility. He created the same open, 
collegial relationship with students that 
he nurtured at SLAC for the entire staff, 
top to bottom. On the wider stage, he was 
committed to supporting international 
collaboration in science, and to advancing 
the cause of arms control and peace ina 
world in which nuclear weapons of 
enormously devastating potential were 
proliferating. 

He pursued those efforts with great vigour, 
whether at Pugwash disarmament meetings, 
participating in official government or 
National Academy of Sciences conferences 
and working panels, or through his wide 
network of personal contacts. He fostered 
bonds of cooperation in the belief that they 
were of great importance even beyond their 
value to science. He saw improving the 
mutual understanding and trust between 
otherwise estranged communities in 
countries with confrontational relationships, 
particularly in the Soviet Union and 
China, as steps towards reducing the 
misunderstandings that could trigger a 
nuclear holocaust. 

Panofsky’s commitment to international 
cooperation in high-energy physics started 
as far back as the 1950s, including serving 
on the high-energy physics subcommittee 
of the International Union of Pure and 
Applied Physics. It continued to the end of 
his life, with his work as an adviser for the 
scientific programme at China's Institute 
of High Energy Physics in Beijing. He took 
special interest in the Beijing Electron/ 
Positron Collider, which SLAC strongly 
supported. 

As a White House science adviser during 
the administrations of presidents Dwight D. 
Eisenhower and John FE. Kennedy, Panofsky’s 
technical contributions were instrumental 
in the negotiation between the United States 
and the Soviet Union that led, in 1963, to 
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the Limited Test Ban Treaty that prohibited 
all except underground nuclear explosive 
tests. Throughout his life, he fought for a 
Comprehensive Test Ban Treaty, and against 
the decision to deploy ballistic-missile 
defences, on the grounds of their technical 
limitations and ineffectiveness against 
massive attacks. 

In 1977, his prominence in arms control 
drew the attention of Playboy magazine. The 
article pointed out that “He is 5-feet, 2-inches 
tall, weighs 150 pounds, neither smokes nor 
drinks, and is manifestly, painfully indifferent 
to clothes” — but that, as a “key figure in 
the Strangelove business’, he had helped 
the US government avoid pitfalls it might 
otherwise have stumbled into. 

Panofsky received just about every honour 
that science, academia and a national 
government can bestow. His awards 
included the National Medal of Science 
and the Enrico Fermi Award from the 
US government. His many contributions to 
scientific collaborations were recognized 
by his election as an honorary member of 
the leading scientific societies in the United 
Kingdom, France, Russia and China. 

Pief spent the last day of his life in his 
office at SLAC, writing and arguing for 
arms control, and looking forward to the 
publication of his informal autobiography, 
Panofsky on Physics, Politics, and Peace: Pief 
Remembers, which appeared the following 
week. He is survived by his wife Adele 
DuMond Panofsky, 5 children and 11 
grandchildren. 

Sidney D. Drell 

Sidney D. Drell is professor emeritus of the 
Stanford Linear Accelerator Center, Stanford 
University, Menlo Park, California 94025, USA, 
and a senior fellow of the university's Hoover 
Institution. 

e-mail: drell@slac.stanford.edu 
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Cover illustration 


The centrepiece of this 
landmark collection of 
papers is the publication of 
newly sequenced genomes 
for ten Drosophila species, 
which are compared with the 
two previously known. On 
the cover are anaesthetized 
individuals of all 12 species. 
(Image by Andrew G Clark, 
Cornell University.) 


GENOME 
LABOURS 
BEAR FRUIT 


pproximately 100 years ago now, the fruit 

fly Drosophila was first bred at Harvard 

for use in laboratory research. These 

stocks led to experiments establishing 
many early tenets of classical genetics. Hard-core 
Drosophila researchers can even trace their own 
‘pedigree’ based on their relationship to Thomas Hunt 
Morgan, the father of fruit-fly genetics. 

In 2007, Drosophila is one the most Internet-savvy 
laboratory organisms, with extensive databases 
devoted to genetics, genomics, taxonomy, breeding 
and mail ordering. Several Nobel Prizes have been 
based in part or in total on work in flies. Flies have 
even been sent on the space shuttle to study immune 
system function. The genome of the most familiar 
species, Drosophila melanogaster, was the test case for 
‘whole-genome shotgun sequencing; which opened 


the doors to the era of organismal genome sequencing. 


The community has not rested on these laurels, and 
now marries the pre-eminent position of the fly in 
evolutionary biology with cutting-edge genomics by 
studying 12 completed Drosophila genomes at once. 
Nature is honoured to publish two Articles detailing 
the sequence and analysis, allowing the description 
of ‘evolutionary signatures’ on functional elements 
throughout the genomes. To capture the scope 
of this achievement and celebrate the fruit fly as a 
model organism in basic research, we have also asked 
researchers to explore the past, present and future of 
Drosophila in many diverse areas of biology — from 
physiology and cell biology to neural circuits and gene 
expression. Size doesnt matter: these tiny fruit flies 
are once again poised to take on the world. 


Chris Gunter, Senior Editor 
Francesca Cesari, Associate Editor 
Deepa Nath, Senior Editor 

I-Han Chou, Senior Editor 

Alex Eccleston, Senior Editor 
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Come fly with us 


Ewan Birney 


The genomes of 12 fly species have been analysed comparatively. Why should we care? Because sequences 
that have resisted the selective forces of evolution from fly to human must have functional significance. 


Geneticists and molecular biologists have 
always had a soft spot for the fruitfly Drosophila 
melanogaster, for this innocuous organism is 
continually providing insights into the biology 
of multicellular organisms. But in the 1990s, 
when the nematode worm Caenorhabditis 
elegans became the life and soul of the genom- 
ics party, and we humans were always the 
guests of honour, flies were in danger of being 
left off the guest list. 

This started to change when Celera Genom- 
ics' sequenced the genome of D. melanogaster 
as a trial, before tackling the genomes of larger 
species. Now, in the era of evolutionary genom- 
ics, the sequencing of 10, and comparative 
analysis of 12, fly species — reported in this 
issue” and elsewhere in more than 40 compan- 
ion papers — means that flies have overtaken 
other species to become a favourite organism 
of genomicists too. 

Every aspect of an organism emerges and 
persists through evolution. Consequently, 
researchers have always used evolutionary 
analysis to understand genomes, in particular 
to identify protein-coding genes that are con- 
served between organisms. But evolutionary 
processes can be studied far more effectively 
than by merely cataloguing the gene content of 
a genome. Specifically, researchers can investi- 
gate two complementary evolutionary aspects: 
negative selection and positive selection. Stark 
et al.” (page 219) study negative selection, or 
the presence of functional genomic elements 
that, despite having undergone many random 
mutational events, have not changed in func- 
tion (Fig. la). By contrast, the Drosophila 12 
Genomes Consortium (Clark and colleagues, 
page 203)’ investigate positive selection, or 
the acquisition of new functions in different 
species (Fig. 1b). 

The remarkable diversity of fruitfly species 
makes them ideal organisms for such com- 
parative analysis. Consequently, the authors 
studied closely related species such as D. simu- 
lans (pictured) and D. sechellia (which have a 
genetic distance equivalent to that between 
humans and closely related primates), as well 
as more distant drosophilids such as D. grim- 
shawi. This is one of the many exotic Hawaiian 
species, and is physically 100 times bigger than 


184 


its normal laboratory cousins, with a genetic 
distance between them equivalent to that 
between humans and lizards. 

To discover functional elements and to 
refine our understanding of elements already 
known, Stark et al.’ draw on most of our cur- 
rent knowledge of these elements, and use 
nature’s own repertoire of mutations and 
selection. They consider all known classes of 
functional element — from the well-under- 
stood protein-coding genes to the more 
elusive motifs that regulate gene expres- 
sion. These analyses allowed the authors 
to identify incorrect biological infor- 
mation ascribed to specific genomic 
sequences of D. melanogaster. 

Stark and colleagues iden- 
tify several evolutionarily 
conserved elements embed- 
ded in continuous sequences 
of coding DNA. These 
include stop codons (three- 
nucleotide sequences 
that signal termination 
ofa protein sequence) 
and frameshift muta- 
tions, which throw 
the coding sequence 
out of step. It is hard 
to imagine that such 
gene structures — in 
which, for example, 
stop codons transcribed into messenger 
RNAs are ignored by the protein-transla- 
tion machinery — are compatible with the 
normal rules of translation. So these findings 
strongly indicate the existence of additional, as 
yet unknown, mechanisms for the pre-trans- 
lational processing of mRNAs, or alternative 
modes of translation. 

MicroRNAs are short sequences of naturally 
occurring, single-stranded RNA that regulate 
gene expression. The authors next investigated 
genes for non-coding RNA sequences, such 
as microRNAs, and identify new microRNA 
sequences, thereby expanding the list of these 
regulatory sequences in D. melanogaster from 
74 to 101. 

Regulatory motifs are another type of func- 
tional element Stark et al. studied. The authors 
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provide both an extensive ‘dictionary of such 
motifs and, for the first time in a genome-wide 
manner for an organism, a set of instances in 
which such motifs are putatively functional. 
Using genomics to identify cases of regula- 
tory-motif activity is, indeed, an exciting new 
approach, and uses what the authors call 
‘branch length score’. This method takes into 
account the alignment and sequencing errors 
that are common in real data, and it can be 
applied to the whole of a phylogenetic tree. 
Stark et al. carefully assess different 
statistical aspects of their method, 
providing a goldmine of functional 
elements that can be confidently 
used by molecular biologists study- 
ing flies and by laboratories interested 
in gene regulation. 

Compared with the work of Stark 
et al., Clark and colleagues’ findings* 
on aspects of positive selection are of 
less direct use to molecular biolo- 
gists working on D. melanogaster. 
Instead, their results provide for 

the first time a comprehensive 
set of genome-wide insights into 
how organisms arise during evo- 
lution. Statistically, the authors’ 
analysis is not as powerful as that 
of Stark and colleagues. But this is 
not surprising, because their aim 
was to understand positive selec- 
tion, which occurs in a non-con- 
tinuous manner across the different 
Drosophila lineages, whereas the negative 
selection studied by Stark et al. is relatively 
constant and can easily be aggregated across 
the entire data set (Fig. 1). 

Nevertheless, Clark and colleagues pro- 
vide valuable insights into the evolution of 
Drosophila species. For example, by compar- 
ing the genomes of 12 drosophilid species, 10 
of which they have sequenced and present in 
this issue, they show that, on both large and 
small scales, genomic rearrangements are 
extremely common in these genomes. They 
also find that about a third of the genes have 
undergone positive selection through muta- 
tions that affect the position of at least one 
amino acid. This suggests that positive selection 
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Species A 
T Species B 
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b Positive selection 


Figure 1 | Two types of evolutionary selection. a, For their analysis, Stark et al.” studied negative 
selection (blue), in which specific bases (from the four possible ones, A, T, C and G) remain roughly 
constant across the genomes of all lineages to ensure the conservation of functional genomic elements. 
Such an analysis uses three main methods: identifying conserved protein-coding sequences (1); 
identifying conserved paired bases in non-coding RNA genes (2); and identifying conserved specific 
motifs in the locale of the alignment (3). b, By contrast, Clark and colleagues’ searched for cases of 
positive selection (red), which results in modification of specific bases in different species, leading to 
the acquisition of new functions. Two main methods are used to study positive selection: identifying 
fast-evolving codons embedded in a set of negatively selected codons (4), and searching for fast- 
evolving base pairs in the context of a non-coding RNA structure (5). The central tree, which indicates 
the phylogeny of the species (not all branches are shown), highlights the fact that, whereas negative 
selection is relatively continuous, positive selection is intermittent. Black lines in 2 and 5 show base 
pairing in the secondary structure. So although the positions paired may not show conservation on 
each blue column, the paired positions maintain a valid base pair. 


occurs across many genes in a genome. 

Codon-usage bias is the selective use by 
an organism of certain codons from a pool 
of codons that all specify a given amino acid, 
and it varies between different organisms. 
Clark and colleagues discover that, com- 
pared with other drosophilids, one species, 
D. willistoni, shows substantially reduced 
codon-usage bias. 

The authors also show that genes encoding 
proteins involved in olfaction and immunity 
— the usual suspects for positive selection 
among protein-coding genes — have evolved 
faster than the rest of the genome. Rapid evolu- 
tion was also seen in genes that regulate spe- 
cific aspects of Drosophila physiology, such as 
insecticide resistance. During its long asso- 
ciation with humans, Drosophila has endured 
radical changes to its environment, ranging 
from the introduction of insecticides to the 
transfer of species through human migration. 
We can therefore probably expect many inter- 
esting studies attempting to correlate genomic 
changes with such events in the fly’s evolution- 
ary history. 

What are the broader implications of the 
findings of these two studies”’, particularly 
for further study of the human genome? In the 
case of negative selection, the evolutionary- 
genomics approach taken by Stark et al. clearly 
provides impressive insights into functional 
elements that are conserved across a clade (a 
group of related organisms). The proposed 
Mammalian Genome Project*, which is well 
under way, is likely to have roughly the same 
statistical power as Stark and colleagues’ data 


set. This would mean that we havea collection of 
powerful exploratory methods that can be 
applied to large-scale genomic analysis in 
mammals, and that are complementary to 
experimental techniques. In theory, there 
should be no qualitative difference in gen- 
erating results for the Drosophila and mamma- 
lian clades using these methods. Nonetheless, 
the larger size of mammalian genomes, and 
the fact that there are potentially more fluc- 
tuations in the rate of neutral evolution, both 
across the genome of one species and between 
genomes of different species, may pose some 
interesting problems to be overcome. 

Researchers are concerned that data 
obtained using methods based on evolution- 
ary-genomic analysis do not entirely overlap 
with those obtained through experimental dis- 
covery methods, such as ChIP-chip and ChIP- 
seq, which generate comprehensive in vivo 
maps of transcription-factor binding sites and 
other functional DNA elements. In particular, 
these experimental techniques often define a 
set of elements that are not identified as con- 
served by the sensitive criteria of evolutionary- 
genomic analysis. As discussed previously° and 
by Stark et al., this mismatch seems to be con- 
sistent across species and analyses performed 
by different laboratories. So it probably reflects 
our lack of understanding of how seemingly 
neutral evolutionary processes give rise to new, 
biochemically active elements before selection 
kicks in, rather than the existence of a large 
portion of lineage-specific elements, or defects 
in the methods used. 

The analysis of positive selection by Clark 
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and colleagues’ is undoubtedly the broadest 
and most detailed investigation performed 
in any clade of multicellular organisms. Their 
study emphasizes the fact that, to under- 
stand differences between species, and thus 
how evolution leads to adaptive changes, we 
must improve the methods we use, and look 
at larger data sets and a broader range of spe- 
cies. This argument favours both sequencing 
the genomes of more species — now a realistic 
prospect given the advent of radically cheaper 
sequencing technologies — and determined 
efforts to carry out experimental studies on 
other members of each clade. Such studies are 
essential to any attempts to correlate sequence 
changes with changes in functional elements, 
and so test any new methods developed. 

For the drosophilids, the next phase should 
entail sequencing the genomes of yet more fruit- 
flies and other members of the order Diptera, 
thereby adding to the sequenced genomes of 
the drosophilids discussed here and their dis- 
tant cousins, mosquitoes”. Moreover, more 
sequences should be generated at the popula- 
tion level — that is, we should sequence several 
individuals of the same species to gather the 
raw material for classical population-genetic 
analysis, which can be used for comparison 
with evolutionary data. Attempts to generate 
such resources are well under way for some 
drosophilid species. Finally, concerted efforts 
to obtain new experimental results in other 
species, beyond the experimental workhorse 
D. melanogaster, are needed for comparison with 
data obtained through evolutionary analysis. 

Clark and colleagues’ findings suggest that, 
to understand the fascinating adaptive changes 
among primates, including those unique to 
humans, we probably need to sequence the 
genome of every extant primate (and, where 
possible, any extinct primates with recoverable 
DNA), using optimal sequencing strategies to 
obtain both population-level data and accurate 
genome sequences. Basic molecular-biological 
studies on cell lines from selected primate spe- 
cies will also be needed to correlate sequence 
changes with changes in functional elements. 

Returning to the present, the data presented 
and analysed by Stark et al.” and Clark and col- 
leagues’ provide the first significant example of 
the power of evolutionary genomics, which will 
bea central research theme for the next dec- 
ade. It also means that genomicists can finally 
join their geneticist and molecular-biologist 
colleagues in the fruitfly fan club. 2 
Ewan Birney is at the European Bioinformatics 
Institute, Wellcome Trust Genome Campus, 
Hinxton, Cambridge CB10 1SD, UK. 
e-mail: birney@ebi.ac.uk 
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Drosophila and the genetics of the 


internal milieu 


Pierre Leopold’ & Norbert Perrimon* 


‘Homeostasis’, from the Greek words for ‘same’ and ‘steady’, refers to ways in which the body acts to maintain a stable 
internal environment despite perturbations. Recent studies in Drosophila exemplify the conservation of regulatory 
mechanisms involved in metabolic homeostasis. These new findings underscore the use of Drosophila as a model for the 


study of various human disorders. 


n 1865, Claude Bernard, in his Introduction to the Study of 

Experimental Medicine, proposed the concept of the ‘internal 

milieu’, later referred to as ‘homeostasis’ in 1932 by Walter 

Cannon. Homeostasis is one of the most remarkable properties 
of complex systems that permit organisms to function effectively in a 
broad range of environmental conditions and allow survival against 
fluctuations such as temperature, salinity, acidity and nutrients. The 
kidney, for example, contributes to homeostasis by maintaining 
salt and ion levels in the blood, regulating the excretion of urea, 
reabsorbing substances into the blood and regulating blood water 
levels. Homeostasis depends on the dynamic action and interaction 
of a number of sensors to adapt to ever-changing environmental 
conditions, and on hundreds of positive and negative feedback 
mechanisms. 

In recent years, and perhaps surprisingly, we have learned, mostly 
from genetic studies of physiological responses in Drosophila, 
that many parallels exist between invertebrate and mammalian 
homeostasis. As exemplified by the seminal studies of Vincent 
Wigglesworth’, insects have historically been long-standing models 
in physiology as a result of their technical advantages such as short 
life cycles, large populations and the possibility of simple surgical 
procedures. In the past, however, interest in their physiology was 
driven mostly by intellectual fascination with their diverse physio- 
logical adaptations to virtually every habitat on Earth, as well as by 
potential applications for pest control. Here we review several recent 
findings that establish Drosophila as an emerging model for mam- 
malian physiology. 


Metabolic homeostasis 

Drosophila and all other higher organisms constantly adapt their 
energy needs to nutritional status through metabolic regulation; that 
is, sugar and lipid homeostasis. The emerging picture in the fly is that 
of a simpler and well-balanced integrated system between various 
organs, each with distinct physiological roles in maintaining energy 
homeostasis (Fig. 1). Drosophila has a less complex genome than 
vertebrates and has little gene redundancy. For example, flies have 
a unique insulin receptor that mediates all functions that have so far 
been attributed to insulin-like peptides. Sugar levels are maintained 
by neurosecretory cells located in the brain and ring gland (together 
forming a bipartite ‘Drosophila pancreas’) that secrete insulin and 
adipokinetic hormone (AKH, the insect glucagon) into the open 
haemolymph (the ‘Drosophila blood’)** (reviewed in refs 6-8). 
Excess sugar is stored in the form of glycogen that accumulates both 


in muscles and in the fat body (the ‘Drosophila liver’). On stimulation 
by AKH, the fat-body glycogen phosphorylase is activated and sugar 
(trehalose in insects) is released into the haemolymph. The fat body 
also acts as the storage place for fat and thus is reminiscent of the 
‘white fat’ of vertebrates. A recent series of studies illustrate the con- 
vergences in the control of lipid metabolism between flies and mam- 
mals. When food is scarce, fat is released from the fat body into the 
haemolymph and is then captured by the oenocytes (the fly ‘hepato- 
cytes’) for energy production’. In both systems, information on 
energy shortage is relayed by lipolytic hormones, which activate a 
pathway dependent on cyclic-AMP-dependent protein kinase that 
leads to the activation of specific lipases at the surface of intracellular 
lipid droplets. However, in flies the specific role of these lipases is still 
unclear: both adipose triglyceride lipase (ATGL) knockout mice and 
brummer (the ATGL orthologue in flies) mutant flies are obese as a 
result of an impairment of lipid mobilization'®”’. 

Genetic ablations have had major roles in deciphering the elabo- 
rate crosstalk between the various sensor tissues and their responsive 
organs. For example, ablation of insulin-producing cells leads to 
diabetic flies and, conversely, flies in which AKH-producing cells 
have been ablated have low levels of circulating sugars*'*. Further, 
proper mobilization of fat from the fat body is perturbed in the 
absence of oenocytes—an interaction reminiscent of the crosstalk 
between hepatocytes and adipocytes in humans”!’. Additional 
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Figure 1| Interactions between the various organs involved in metabolic 
homeostasis in a Drosophila larva. Hormone names are shown in red, tissue 
names in blue. DILP, Drosophila insulin-like peptides; TAG: triacylglycerol. 
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insights have been obtained from the use of drugs such as sulphonyl- 
urea (glyburide or tolbutamide), currently used in the treatment for 
type 2 diabetes as activators of insulin secretion by pancreatic beta 
cells’*. Drosophila larvae fed with sulphonylurea show increased sugar 
levels, suggesting that AKH-expressing cells in the fly use a conserved 
mechanism for coupling circulating sugars to glucagon and AKH 
release’. 

An important notion emerging from studies in the fly is that all 
tissues in the organism might not be exposed or equally sensitive to 
variations in nutritional conditions. The fat body, in particular, 
seems to be a unique metabolic sensor because it has a pivotal func- 
tion in buffering nutritional information for the entire animal'*”. 
For example, in conditions of amino-acid restriction, the conserved 
TOR pathway in fat cells functions as an energy sensor that induces a 
remote and systemic inhibition of insulin/insulin-like growth factor 
signalling, leading to a general decrease in growth during larval 
development'®. 

Cross-regulation between tissues for the control of insulin signal- 
ling is reminiscent of recent observations in mammals related to 
insulin resistance, which leads to obesity and type 2 diabetes, two 
components of a highly prevalent and complex disorder called the 
metabolic syndrome’*’’. Whether Drosophila is a good model for 
dissecting the basic molecular and genetic mechanisms of this 
disorderis still uncertain; however, the first example of insulin- 
resistant flies has emerged recently. Teleman et al.”° showed that flies 
that do not express the microRNA-encoding gene miR-278 in the fat 
body are lean and have elevated levels of sugar in the haemolymph, 
despite high expression levels of insulin. Further, whereas injection of 
insulin into wild-type larvae provoked an activation of the insulin 
pathway in fat cells, this response was abrogated in miR-278 mutant 
animals, indicating that the ‘lean fly’ phenotype is associated with 
insulin resistance in the fat body”. These findings are reminiscent 
of the decreased fat-pad mass observed in the fat-specific insulin 
receptor knockout model in the mouse’'. Thus, as in mammals, 
insulin signalling in Drosophila regulates patterns of energy storage 
in fat and other tissues, and disruption of this process by insulin 
resistance leads to important lipid disorders. 


Interaction between metabolism and other processes 


In addition to progress in our understanding of the logic of metabolic 
homeostasis in the fly, important advances are being made in our 
knowledge of the complex interactions between metabolism and 
growth, ageing or behaviour. For relevant reviews in the Drosophila 
literature see refs 22-25. How organisms adapt their growth pro- 
gramme to changes in energy needs and status, and how hormonal 
systems such as insulin and steroid hormones interact with metabolic 
regulation, are areas of particular interest for future research. In 
particular, functional interactions recently identified between insulin 
and the steroid hormone ecdysone**”’ suggest that fly steroids also 
participate in metabolic control. Further, in adult flies a decrease in 
metabolic rates resulting from dietary restriction or decreased insulin 
signalling results in extension of the normal lifespan (reviewed in 
refs 23, 24). The mechanism by which dietary restriction controls 
lifespan is unclear, but metabolic studies in this short-lived model 
system will undoubtedly contribute to an explanation of the control 
of ageing by nutrition. 

Finally, several complex behaviours are modulated by metabolic 
conditions. Feeding behaviour is regulated by two major inter- 
mingled controls: the homeostatic and the hedonic systems”. 
Homeostatic regulation of feeding ensures that circulating nutrient 
levels are sensed, which in turn directly regulates feeding activity. 
Work on Drosophila and studies on the mouse recently showed that 
a TOR-dependent molecular sensor for amino acids exists in specific 
brain cells (hypothalamic cells in the mouse; neuroendocrine cells in 
Drosophila). These cells sense variations in circulating amino-acid 
levels and in turn regulate feeding”. Understanding the circuits 
that transmit these neuroendocrine signals is of prime interest for 
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the study of eating disorders and are beginning to be studied in 
Drosophila’. 


Concluding remarks 


The goal of this review is to provide a selective overview of recent 
progress in studies of metabolic homeostasis in Drosophila. Through 
the examples cited, it is clear that many questions will be answered in 
the next few years. For example, studies will decipher the specific 
crosstalk between various larval and adult tissues for the control of 
energy homeostasis. Characterization of mutants associated with 
insulin resistance is likely to provide insights into the mechanisms 
of insulin response and its disruption. In addition, Drosophila has 
great potential as a model system in which to study the balance 
between lipid storage and lipolysis, a process whose disruption in 
humans leads to obesity or type 2 diabetes. 

In flies, the characterization of homeostatic regulatory mechan- 
isms is facilitated by the smaller number of homeostatic constants 
and, of course, the ever-improving arsenal of available genetic tools. 
Genetic screens have already identified new components of metabolic 
regulatory networks (such as melted, a new modulator of FOXO and 
TSC2 activity’'), whose function can now be tested in mammals. 
Genome-wide RNA-mediated interference (RNAi) screens, both in 
tissue culture cells** and in vivo*’, provide unique tools for the study 
of metabolic regulations. In particular, in vivo RNAi, allowing the 
activity of a specific pathway to be reduced in a given tissue, should 
showcase Drosophila as a model for metabolic disorders. These 
Drosophila models should prove to be very powerful for drug 
screening, to determine modes of drug action, and to identify drug 
targets. Finally, a better characterization of Drosophila metabolites 
will benefit the continuing work to understand experimental pertur- 
bations of the ‘internal milieu’. 
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Orchestrating size and shape during 


morphogenesis 


Thomas Lecuit’” & Loic Le Goff!” 


Living organisms exhibit tremendous diversity, evident in the large repertoire of forms and considerable size range. 
Scientists have discovered that conserved mechanisms control the development of all organisms. Drosophila has proved 
to be a particularly powerful model system with which to identify the signalling pathways that organize tissue patterns. 
More recently, much has been learned about the control of tissue growth, tissue shape and their coordination at the 
cellular and tissue levels. New models integrate how specific signals and mechanical forces shape tissues and may also 


control their size. 


uestions such as how can groups of cells make up orga- 

nized tissues, organs and bodies, how can development 

produce organisms with reproducible morphological 

patterns, and what mechanisms underlie the diversity of 
organ size and shape (Fig. 1) have haunted scientists for over a cen- 
tury. From the early observations of embryology to the quantitative 
models of systems biology, important discoveries marked the long 
history of morphogenesis. 

Drosophila has proven to be a powerful system with which to 
elucidate the molecular mechanisms of morphogenesis, identifying 
the signals that pattern the body plan and characterizing cell 
mechanics and dynamics underling tissue remodelling. A principal 
challenge is to understand within a single mechanistic framework 
how these patterning signals and cellular responses—such as cell 
division and cell shape changes—are coordinated in tissue growth 
and tissue remodelling. 


Figure 1| Diversity of size and shape of organs during morphogenesis. 
Wings of dipterans illustrate marked differences in the size and shape of 
organs. Variations in wing length can almost reach tenfold (left). The 


The size and shape of genetically marked clones of cells reflect in 
miniature the size and shape of the tissue they belong to. Cell divi- 
sion, cell death, cell shape changes and cell rearrangements are the 
building blocks on which tissues are shaped and organs are made 
(Fig. 2). The orchestration of these elementary processes depends on 
a constraining genetic programme operating on cell behaviour: for 
instance, a specific set of signalling molecules, growth factors, pro- 
mote cell divisions and tissue size, whereas other proteins control the 
orientation of cell divisions, oriented cell rearrangements and so on, 
and hence tissue shape. A surveillance mechanism orchestrates 
proper tissue size and shape and involves reciprocal interactions 
between the cell and tissue scales. When a group of cells dies, com- 
pensatory mechanisms controlled at the tissue level ensure that the 
proper tissue size and shape are not compromised. 

The aim of this review is to highlight recent important findings on 
the mechanisms of tissue growth and shape and to encapsulate them 


Shape 


width-to-length ratio can also vary significantly (right). Scale bar, 1 mm. 
(Images are courtesy of N. Gompel and B. Prud’homme.) 
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in a single framework of morphogenesis. We first focus on how cell 
division and cell death control tissue growth. We then detail how the 
mechanics of cell shape and division underlie tissue shape. Finally, we 
discuss how feedback mechanisms may orchestrate tissue size and 
shape. 


Tissue growth: to die, to survive, to divide 


Tissue growth can be best studied in the Drosophila developing adult 
tissues called imaginal discs. Imaginal discs are epithelial layers grow- 
ing from about 40 cells to 50,000 cells in 4 days of continued divi- 
sions. Although this massive increase in cell number and tissue mass 
is under organismal control as far as the provision of the necessary 
energy input is concerned, the control of tissue size is intrinsic to the 
disc. Proper tissue size is not reached by counting cells: changes in cell 
size often yield compensatory modifications in cell number, thereby 
maintaining tissue size’’. This suggests that tissue dimensions (size 
or mass) may be measured. 

Cell competition and apoptosis. Tissue-level control of tissue size is 
manifest in the process of cell competition discovered 30 yr ago*’, 
whereby faster growing cells can out-compete slow-growing cells 
(Fig. 2c). For example, wild-type clones can take over entire com- 
partments initially occupied by slow-growing cells heterozygous for 
the Minute (M) mutations in genes encoding ribosomal proteins. 
Myc is another major regulator of cell competition, with as little as 
twofold changes in Myc expression being enough to trigger over- 
growth of cells and competition with surrounding wild-type cells®. 
The cellular mechanisms underlying competition are only starting 
to be unravelled. To some extent, fast cells may compete with slow 
cells for limited amount of survival signals provided by the trans- 
forming growth factor (TGF)-B/BMP (bone morphogenetic protein) 


Cell rearrangements 


Figure 2 | Cellular mechanisms of tissue size and shape. a, Tissue 
proliferation and the increase in tissue mass are driven by continuous cell 
divisions (outlined in red). b, Oriented cell divisions, here along the 
horizontal axis, cause the elongated growth of the clone and of the organ. 
c, Cell competition is the process by which a fast-growing population (red) 
out-competes a slow-growing one (white). Out-competed cells die by 
apoptosis (cross symbol). This process is implicated in tissue size regulation. 
d, Cell rearrangements such as intercalation drive tissue elongation and 
affect tissue shape. Here the red interfaces shrink and new horizontal 
interfaces (blue) are formed, producing an exchange in cell neighbours. 
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molecule Decapentaplegic (Dpp)’. There is no consensus, however, 
on the exact importance of Dpp in the competition process”®. 
Competition also involves apoptotic elimination of the slow cells 
and their engulfment by the fast-growing cells*. The stress-response 
pathway mediated by Jun N-terminal kinase (JNK)’ and the pro- 
apoptotic genes hid (also called Wrinkled) and rpr*® were shown to 
be involved in the apoptosis of the out-competed cells. The link 
between cell competition and tissue size is manifest in the following 
set of experiments: uniform expression of myc, where no competition 
occurs, causes tissue overgrowth, whereas mosaic expression of myc, 
which triggers competition, leaves size unchanged, indicating that 
the out-competed cells buffer the overgrowth of myc-overexpressing 
cells. Consistent with this, mosaic expression of myc results in tissue 
overgrowth when cell competition is reduced by blocking apoptosis. 
Another notable observation indicates that cell competition in a 
wild-type tissue buffers variations in tissue size’. 
Control of cell division. Control of tissue size also involves a regu- 
lation of cell division. Two remarkable properties of cell division in 
imaginal discs are that it is random but uniform across the discs and 
that it ceases uniformly when correct disc proportions are attained. 
Two models have been proposed to explain scale invariance in grow- 
ing tissues. One model emphasizes the role of local communications 
between cells with different positional values to drive intercalary 
growth’. These communications could be mediated by the cell adhe- 
sion molecule Fat, an activator of the Hippo pathway that controls 
cell proliferation (reviewed in ref. 10). Alternatively, long-range sig- 
nalling by extracellular morphogens is viewed as the principal deter- 
minant of growth'’. Morphogens are molecules that form gradients 
of concentration from a source and activate different target genes at 
different concentration thresholds. The morphogen Dpp controls 
tissue pattern'”'’ and tissue growth'*’’. Day and Lawrence"! pro- 
posed that the slope of the gradient promotes cell division above a 
certain threshold. Provided that the addition of new cells decreases 
the slope of the gradient, growth would arrest when the gradient 
becomes too shallow (Fig. 2). Consistent with this, it was elegantly 
shown that cell division is transiently induced in regions where the 
slope of the Dpp gradient is experimentally modified"®. Several obser- 
vations, however, contradict a simple formulation of this model: (1) 
uniform Dpp expression causes overgrowth; (2) the assumption that 
the Dpp ligand gradient scales with the tissue is not experimentally 
supported'”’*; (3) the model fails to account for uniform cell division 
in the tissue. Thus, additional mechanisms will be required to explain 
fully the control of tissue size. As detailed below, the mechanical 
constraints imposed by tissue growth on local cell division can also 
be considered in parallel with signalling. 

Whereas an increase in cell number drives tissue growth, tissue 
shape involves changes in cell positions controlled by cell rearrange- 
ments and the orientation of cell division. 


Tissue shape: orienting cell division and movements 
Spatial control of cell divisions. A number of mechanisms have been 
proposed for tissue elongation. It was suggested a long time ago that 
polarized cell divisions might be important for morphogenesis in 
Drosophila’’ (Fig. 1a, b). However, the major role of polarized cell 
rearrangements during cell intercalation in vertebrates and inverte- 
brates (see below) overshadowed this mechanism. As a result, experi- 
mental evidence that polarized cell division also has an essential role 
in plant and animal morphogenesis only accumulated recently”, 
with striking examples in Antirrhinum petal morphogenesis”’ and 
zebrafish neurulation™. In Drosophila too, polarized cell divisions 
occur and participate in tissue morphogenesis. A detailed analysis 
of Drosophila imaginal discs showed, for example, that clones of cells 
grow anisotropically along the axis of tissue growth because cell 
divisions are biased along the proximal/distal axis*’. Elongation of 
Drosophila embryonic epithelia is also controlled to some extent by 
oriented cell divisions”®. 
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What controls the orientation of cell division? Several components 
of the planar cell polarity pathway (PCP)—that orient other pro- 
cesses such as hairs and cilia—have been implicated. For instance, 
the cell adhesion molecules Dachsous and Fat, which orient PCP 
signalling, are required in the Drosophila wing”’. However, core com- 
ponents of PCP signalling (for example, Dishevelled, Frizzled) have 
not been implicated in polarized cell division in the Drosophila wing. 
Note, however, that Frizzled controls orientation of the mitotic 
spindle during division of the sensory organ precursor in the 
Drosophila notum”’. Moreover, PCP signalling controls polarized cell 
divisions in vertebrates. The search for signals controlling oriented 
cell divisions is thus still ongoing in Drosophila and other organisms. 
Cell division and cell shape. Changes in cell shape have also been 
proposed to drive tissue extension. In an epithelial layer cells adopt 
characteristic polygonal shapes dictated largely by the interplay 
between adhesion and cortical tension’*. Cell adhesion mediated by 
cadherins tends to increase cell contacts whereas cortical tension 
exerted by the actomyosin network reduces them. This is remarkably 
illustrated in post-mitotic tissues, such as the pupal Drosophila retina, 
where differential adhesion mediated by E- and N-cadherin controls 
the shape of cone cells”. In pupae, wing cells remodel their irregular 
contacts to produce a highly ordered hexagonal tiling by a mech- 
anism implicating E-cadherin trafficking and PCP signalling*’. In 
remodelling epithelia, cells may change shape markedly. Epithelial 
cell elongation accompanies several tissue extension processes such 
as Drosophila dorsal closure*' and imaginal discs evagination*”. The 
underlying mechanisms remain unclear. 

What is the effect of cell division on cell shape? During division 
epithelial cells exhibit a rounder (less polygonal) morphology, but 
live imaging has shown that cell contacts are not remodelled and 
daughter cells remain in contact*’. This explains the old observation 
that clones remain compact in imaginal discs. Defects in the even 
distribution of E-cadherin after cell division lead to a disruption of 
cell contacts and to cell scattering**. Remarkably few constraints on 
the process of cell division (such as the production of two new 
vertices at each round of mitosis) conspire to produce a single topo- 
logical equilibrium with a majority of hexagons**”’, without assump- 
tions on the mechanics of cell shape**. Heterogeneities in the rate of 
cell division locally affect the distribution of cell shape. 

Thus, the shape of cells in a growing tissue is influenced by cell 

surface mechanics and by local cell division rates. 
Cell rearrangements and intercalation. Another major mechanism 
driving tissue extension is cell intercalation, whereby cells change posi- 
tion by remodelling their adhesive contacts. The evagination of pupal 
imaginal wing and leg discs, for instance, was proposed early on to stem 
from changes in the organization of cell contacts**. Intercalation has 
been carefully studied during elongation of the embryo, called germ- 
band elongation*®” (Fig. 2d). In this system, contacts between antero- 
posterior neighbours shrink (Fig. 2d, red) and new contacts are formed 
at a perpendicular axis (Fig. 2d, blue). This process does not depend on 
external forces exerted at tissue boundaries, but on the local increase in 
cortical tension imposed by the enrichment of Myosin-II at shrinking 
junctions”’. Adhesion is also probably downregulated, as Bazooka (also 
called Par-3)**—a determinant of E-cadherin stabilization*?*°—and 
E-cadherin” are downregulated in shrinking junctions. Planar junc- 
tion remodelling and intercalation are controlled by embryonic polar- 
ity*®***", Surprisingly, the non-canonical Wnt PCP pathway is not 
required for cell intercalation during germband extension. The signals 
orienting cell rearrangements remain elusive. 

The proper shaping of a growing organ thus requires that, as 
new cells are formed, their relative positions be controlled. This is 
achieved by regulation of the cell division orientation and of cell 
rearrangements. Cell division itself and cell mechanics thus underlie 
important aspects of tissue shaping. A complete understanding of the 
coordination of tissue size and shape must integrate the regulation of 
tissue growth by signalling pathways with the mechanics and 
dynamics of morphogenesis at the cellular level. 
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Feedback mechanisms coordinating size and shape 


Cell division and cell growth drive tissue expansion. Yet, attaining the 
proper tissue size and shape does not simply rely on cell counting. Thus, 
a tissue-intrinsic property informs, in return, dividing cells about their 
division rate, growth or eventual death. Such a feedback mechanism is 
required to understand growth arrest and tissue shape. Stochastic fluc- 
tuations or persistent variations in growth rate could produce changes of 
an internal variable (for example, pressure or Dpp activity) that would, 
in return, affect growth rate. An inhibitory negative feedback signal can 
have a stabilizing effect, smoothening fluctuations and providing the 
system with a dynamic control to ensure homogeneous growth. 

What mechanisms could generate a feedback? Two plausible alter- 
natives have been proposed. Local regulation of the morphogen- 
ligand or activity gradient might be a way. Regions of enhanced 
growth could locally reduce the slope of the Dpp gradient, and 
hence feed back on growth (Fig. 3a). Quantitative analysis of the 
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Figure 3 | Models of size control by feedback. a, b, Gradient model. a, A 
spatial gradient in Dpp signalling controls growth by promoting cell divisions. 
A local increase in the slope of the gradient (between dashed lines) would 
locally stimulate increased proliferation (outlined in red in the schematic 
diagram of epithelium), resulting (below) in the insertion of new cells and 
tissue expansion. The model proposes that growth changes the local slope of 
the gradient and thereby feeds back on the activity profile of the morphogen 
gradient. b, Evolution of the morphogen activity (not necessarily the ligand) 
gradient profile: as the tissue grows, the Dpp activity gradient scales with the 
tissue. Growth arrest is triggered when the gradient becomes too flat. 

c, d, Mechanical feedback model. ¢, Growth is proposed to be influenced by 
mechanical forces, with compression (in red) inhibiting cell division and mild 
stretch (blue) promoting it. Above extreme stretch or compression, cells die by 
apoptosis. This mechanical feedback explains uniform cell divisions in a 
growing tissue. d, The combination of this mechanical feedback and the graded 
growth-promoting function of the Dpp gradient above a threshold value 
(green) explains uniform growth and homogeneous growth arrest. Cells at the 
periphery stop dividing when, as a result of growth, they fall below this 
threshold Dpp activity. Simultaneously, cells in the centre no longer divide 
because of the increased compression (dark red) imposed onto them by cell 
division arrest at the periphery. x indicates spatial coordinates. 
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establishment of the Dpp ligand gradient'***** showed that it forms 


on short timescales (minutes) not commensurate with the long time- 
scales (hours) of tissue growth, consistent with the fact that the ligand 
gradient does not scale with the tissue’’. However, the activity gra- 
dient (monitored by phosphorylated Mad or expression of target 
genes) is clearly influenced by the local tissue growth”. Further tests 
of this model will thus require a better characterization of the tem- 
poral lag between ligand and activity gradients and of the effect of 
growth on the latter. 

The interplay between cell mechanics and the cell cycle is 
another potential way to provide dynamic regulation of tissue 
growth, as recently suggested in the Drosophila ovary*’. Indeed, an 
inhibition of growth by mechanical compression (and stimulation by 
stretch, Fig. 3c) would provide a negative feedback to reprehend 
heterogeneities of growth. Using quantitative modelling, Shraiman 
proposed that mechanical feedback could account for the uniformity 
of cell division*”. Moreover, combining mechanical feedback with the 
growth-promoting function of a non-scaling Dpp gradient predicts 
growth arrest and scale invariance'”** (Fig. 3d). 

This opens up new perspectives and prompts a better integration 
of the cellular and signalling aspects of morphogenesis in fly and 
other organisms. A better understanding of the causal relationships 
between growth and activity gradient dynamics will be important to 
probe further how morphogens orchestrate size and shape. Whether 
morphogens also control cell division orientation and cell rearrange- 
ments remains an open and major question to investigate. It will also 
be important to test the mechanical feedback model: do stretch and 
compression influence cell division and survival? Do fields of forces 
constrain tissue growth in parallel with growth factors? This mech- 
anical feedback could have other implications on organ shape. It 
could orient cell division—as was suggested in plants*”—or cell rear- 
rangements. These important discoveries in Drosophila should 
prompt further studies testing how they apply to size and shape 
control in mammals. 
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Into the mind of a fly 
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Where do animal behaviours come from and are they controlled by genes? This is the fundamental question posed by the 
field of neurogenetics. Pioneering work from the 1960s in Seymour Benzer's laboratory demonstrated for the first time 
that Drosophila melanogaster fruitflies could be mutated to obtain animals with insomnia, learning disabilities and 


homosexual courtship behaviours. 


orty years ago, the field of Drosophila neurogenetics (defined 

in Boxl) was born in Seymour Benzer’s laboratory at 

Caltech’. In the mid-1960s, Benzer made an abrupt and 

orthogonal turn from his early ground-breaking work defin- 
ing the fine structure of the gene in bacteriophage’ to the heretical 
idea that single genes could control behaviour in complex animals. A 
paper appearing in the September 1967 issue of Proc. Natl Acad. Sci. 
presented the first evidence that mutant flies defective in phototaxis 
behaviour, or locomotor responses to light, could be identified’. The 
premise of neurogenetics—widely disbelieved at the time—was that 
complex behaviours such as the ability to learn and remember, the 
internal biological rhythms of the body, and courtship and sexuality 
could all be under genetic control. 

Drosophila melanogaster as an experimental organism has contri- 
buted much to contemporary neurobiology. The first cloning of a 
structural gene for a potassium channel was achieved by Benzer’s 
trainees, Yuh Nung Jan and Lily Jan, when they isolated the gene 
corresponding to the shaker (sh) mutant’. The founding member 
of the now enormous transient receptor potential (trp) ion channel 
family had its origins as a fly mutant defective in light-evoked retinal 
electrophysiology*. Vertebrate and invertebrate TRP channels have 
since turned up in biological processes as diverse as the sensation of 
odours, tastes, pungent compounds such as wasabi, capsaicin and 
menthol, cold, heat, touch and hearing, among others (reviewed in 
ref. 5). Beginning in the late 1960s, William Pak amassed a large 
collection of mutants defective in visual signal transduction, such 
as the neither inactivation nor afterpotential (nina) mutants®. This 
genetic dissection of phototransduction in Drosophila enabled later 
molecular analysis of the molecules underlying visual signal trans- 
duction in the laboratories of Pak, Gerald Rubin, Charles Zuker and 
others (reviewed in ref. 7). 

The cloning of sh and trp are excellent examples of the power of 
neurogenetics. Both arose from genetic screens designed to test 
the hypothesis that studying shaky flies or flies with altered 
retinal physiology would lead to interesting insights into neural 
function. The tools of this discipline are simple and require only a 
suitable behavioural paradigm (three are shown in Fig. 1), a means 
to make flies with mutations in single genes, and standard mole- 
cular genetic techniques to progress from a mutant phenotype to a 
genotype. 

This Review article will discuss how the revolution started by 
Benzer and his students in 1967 has spread to many fields of neuro- 
biological investigation in Drosophila, from whence it jumped to 
mice, zebrafish and other species, including humans. Here I will focus 
specifically on three original discoveries in Drosophila neurogenetics 
and behaviour—biological rhythms, sexual courtship and chemor- 
eception—and how these have blossomed in the last 40 yr. 


The genetics of circadian rhythms in the fly 

Flies, like all other animals and plants on earth, have a daily routine in 
synchrony with the rhythms of the Sun and Earth. Like humans, flies 
tend to wake up around dawn, enjoy a siesta in the afternoon, and are 
largely inactive after nightfall*. The biological rhythm in locomotor 
activity recurs on a roughly 24h cycle, hence it is termed a circadian 
(circa diem—around a day) rhythm. This modulation of locomotor 
behaviour is driven by external environmental rhythms, but can also 
persist in flies raised for generations in the dark’. 

Ronald Konopka in Benzer’s laboratory provided the first evidence 
that the biological clock was under genetic control and could 
be broken by mutagenesis’®. In an elegant and simple screen for 
flies with altered hatching and locomotor rhythms, using activity- 
monitoring devices such as those in Fig. 1b, Konopka and Benzer 
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Figure 1| A diversity of behaviour paradigms is used to measure 
Drosophila behaviour in the laboratory. a. The olfactory T-maze is used for 
Pavlovian olfactory conditioning’. Flies are trained to associate odour A 
(orange) with electric shock (left). During testing, these flies avoid odour A 
(right). The assay is carried out with reciprocal training, such that only one 
half of the paradigm is depicted here. Flies are depicted as small black dots. 
b, Circadian-activity monitors measure locomotor activity of individual flies 
using an infrared beam (red dotted line)'®. An external computer tracks the 
number of times the fly breaks the beam, allowing continual monitoring of 
fly locomotor activity over a period of weeks. c, The courtship wheel permits 
the observation of up to 10 fly couples in which the male engages in such 
stereotyped sexual activity as the following: genital licking, wing vibration to 
produce a species-specific song, and copulation. Graphic in a is adapted from 
figure 1 of ref. 65 with permission from Elsevier (copyright 2004). 
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Box 1| The subject 


Neurogenetics merges concepts and techniques from neurobiology 
and genetics to study the genetic basis of behaviour and neural 
function. By generating and studying mutant animals that exhibit 
abnormal behaviour, mistakes in neural wiring, or anomalies in the 
structure or function of neurons, neurogeneticists can track down the 
genes responsible for these phenotypes, thereby understanding the 
function of the genes in producing a normal brain and its associated 
behaviours. In 1967, Seymour Benzer suggested that neurogenetics 
could act as a ‘microsurgical tool’ to study the brain: 

“Thus, use of mutation as a microsurgical tool could conceivably 
lead to the identification of the various transmitters, about which little 
is presently known. The counter-current procedure is obviously 
adaptable to a wide range of stimuli, such as gravity, odour, sound and 
special visual patterns, thus lending itself to the isolation of many kinds 
of behavioural mutants, including ones in which the wiring pattern of 
the nervous system is affected. Furthermore, as preliminary 
experiments have shown, the speed of the procedure permits its use in 
the study of short-term modifications of behavior.” 


isolated three different mutant alleles of the same gene, called period 
(per). per’ flies are insomniac, per’ flies live a short day, and per flies a 
long day’®. An amazing parallel with humans was uncovered with the 
recent identification of mutations in a human period homologue 
(PER2) as the genetic culprit behind the per*-like phenotype in famil- 
ial advanced sleep-phase syndrome"! 

Howa single gene could both be necessary for the clock but also set 
its running speed remained a mystery until the age of molecular 
cloning and the isolation of other clock genes. The groups of 
Michael Rosbash and Jeffrey Hall, and the group of Michael Young 
cloned per in 1984 (refs 12, 13). Both per messenger RNA and PER 
protein were subsequently shown to cycle with a circadian rhythm, 
and show a rhythmic nuclear accumulation, prompting a model in 
which PER acts as a feedback suppressor to control the clock'*. per 
turned out to be just the tip of an enormous iceberg of clock genes, 
clock accessory genes, and clock-controlled genes. The present model 
for the Drosophila clock includes a host of core clock components 
that include a positive transcriptional feedback loop (Clock (Clk), 
cycle (cyc) and vrille (vri)), a negative transcriptional feedback loop 
(per and timeless (tim)), and factors that modulate the light-regulated 
accumulation and output of the core clock genes (double-time (dbt, 
also known as discs overgrown or dco), shaggy (sgg), cryptochrome 
(cry), Pigment-dispersing factor (Pdf) and others; reviewed in ref. 8). 
Some recent surprises in the clock field include a somewhat myster- 
ious cytoplasmic timing mechanism that regulates the delay in 
nuclear accumulation of period and timeless'”, as well as the discov- 
ery that the clock protein has chromatin-remodelling activity’. 

Homologues of many of the core clock genes have been identified 
in vertebrates, further validating the fly as a model for circadian 
biology. Microarray studies by Young” and others have identified 
several hundred genes under circadian control, the analysis of which 
promises to provide an integrated view of how the physiology of the 
entire organism is synchronized to the daily rhythms of the planet. 


fruitless and its power to shape sexual behaviour 


Copulation in Drosophila is preceded by an intricate series of sexually 
dimorphic pre-copulatory courtship behaviours between the male 
and female fly'® (Fig. 1c). Benzer’s trainees Hall and Yoshiki Hotta 
(Box 2) used genetic mosaic analysis to define portions of the central 
nervous system required for male courtship behaviour’””° and genes 
that governed heterosexual behaviour’'. One gene named fruitless 
(fru), identified in 1963 by K. S. Gill and cloned over 30 yr later by 
Daisuke Yamamoto”—and separately by the group effort of Hall, 
Bruce Baker and Barbara Taylor?*—is now known to be a master 
regulator of sexuality in the fly’***. The transcription factor encoded 
by the fru gene is expressed in a subset of central, peripheral, sensory 
and motor neurons in the adult fly, which are likely to comprise a 
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circuit controlling sexually dimorphic behaviour®*’. Mutant fru 
males show homosexual courtship behaviour in which large groups 
form chains of males courting each other. In a remarkable experi- 
ment, Barry Dickson showed recently that male courtship behaviour 
directed at females can be induced in chromosomally female flies 
simply by expressing the male-specific isoform of fru in the female 
brain**. Recent work in the mouse from Catherine Dulac’s group 
suggests a similar underlying latency in the female mouse to exhibit 
male behaviours on manipulation of a single gene**. A major goal in 
this field is to define the molecular targets of fru and define the neural 
circuits that drive both male and female sexual behaviours. 


Olfactory communication in the fly 


Fruitflies are strongly attracted to the smell of vinegar, yeast, rotting 
fruit and to each other. The genetic basis of this chemosensory beha- 
viour was first studied by Obaid Siddiqi, a Benzer trainee (Box 2). 
Mutants defective in the olfactory T-maze, using the device in Fig. la 
but omitting electric shock, as well as other olfactory behaviour para- 
digms were collected throughout the 1980s by Siddiqi”, and later by 
John Carlson*’, and others. One of the Carlson mutants, acj6, proved 
to bea key transcription factor necessary for the regulation ofa subset 
of odorant receptor genes*'. The availability of the genome sequence 
of Drosophila melanogaster opened this system to rapid molecular 
analysis by Carlson, Dean Smith, Liqun Luo, Dickson, Richard Axel 
and a number of former Axel trainees, resulting in the complete 
description of the sequence and expression of all 62 odorant recep- 
tors and 68 taste receptors (reviewed in ref. 32), the complete map of 
connectivity of primary olfactory centres****, an initial view of how 
primary olfactory information is mapped in the higher brain®***, and 
a comprehensive survey of ligand tuning of a majority of the odorant 
receptors’’, including those tuned to pheromones**”’. A major effort 
in this growing field is to understand the underlying central mechan- 
isms by which a fly discriminates among all the odours it is able to 
detect and how the circuitry underlying pheromone perception leads 
to stereotyped behaviours. 


A myriad of other complex behaviours 


Beyond these brief examples from the original neurogenetic studies 
to emerge from the Benzer laboratory, many other behaviours have 
been productively dissected with genetic and behavioural tools in 
Drosophila. The seminal work of William Quinn, William Harris 
and Benzer (Box 2) demonstrating that flies can be conditioned to 
avoid an odour paired with shock (Fig. 1a)*°, was followed by the 
identification ofa series of mutant flies that either could not learn this 
task or rapidly forgot it (reviewed in ref. 41). Subsequent genetic 
analysis by Quinn, Ronald Davis, Tim Tully and others produced 
the provocative finding that many learning and memory defective 
mutations in the fly affect the cyclic AMP pathway (reviewed in ref. 
41), the same signalling pathway implicated in conditioned beha- 
viours in Aplysia and the mouse***’. Subsequent genetic screens for 
learning mutants by Tully and others, in one case combined with 
microarray analysis, produced a host of other candidate memory 
genes, including several involved in local control of mRNA trans- 
lation™. 

The cloning of the dunce (dnc) gene* and its enrichment in a part 
of the fly brain called the mushroom body* allowed the field to move 
from the genetic to the cellular level. Davis, Martin Heisenberg and 
others carried out a series of genetic and ablation studies strongly 
implicating this olfactory processing centre in the fly as the seat of 
memory*”**. Current work in the field is zeroing in on how fly-brain 
microcircuitry processes paired odour and shock input*’, how the 
circuitry is modulated by conditioning”, and how the processes of 
learning and retrieval of memories are compartmentalized”’. 
Neurogenetics has also enabled scientists to localize memory to smal- 
ler and smaller areas of the fly brain. A particularly elegant recent 
example comes from Heisenberg, Li Liu and co-workers, who loca- 
lized circuits that learn certain visual features to two groups of 
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Box 2| The people 


Box 2 Figure | The birth of neurogenetics in the Benzer laboratory at Caltech. a, Benzer laboratory at Caltech, November 1971. Front row (left to right): 
O. Siddiqi; research technicians, Y.-H. Jing and J.-Y. Yu; M. Deniro; R. Konopka; D. Kankel; and laboratory manager, E. Eichenberger. Back row (left to 


right): T. Hanson, D. Edgington, Y. Hotta, J. Lewis, P. Christensen and W. Quinn. b, Benzer laboratory at Caltech, around 1972. Front row (left to right): 
W. Quinn, D. Edgington, J. Hall, S. Benzer. Back row (left to right): W. Harris, D. Kankel and research technicians, J. Gorn and B. Butler. Photos courtesy of 


S. Benzer, Caltech. 


laboratory by consulting J. Weiner's celebrated book°®°): 


on the molecular embryogenesis of the vertebrate visual system. 


seminal work on learning and memory in Drosophila. 


Career paths of selected Benzer laboratory members (interested readers can learn more about the history of neurogenetics and the Benzer 


@ Seymour Benzer: still active and scientifically prolific at the age of 86 as the James Griffin Boswell Professor of Neuroscience, Emeritus (active) 
at Caltech. In recent years, his group has studied longevity, brain degeneration, fear, and feeding behaviours in Drosophila. 

@ Jeffrey Hall: Professor at Brandeis University, who was inducted into the US National Academy of Sciences in 2003 for his comprehensive work 
on the neurogenetics of circadian, courtship and social behaviours in Drosophila. 

@ William Harris: fellow of the Royal Society and Head of the Neuroscience Department at the University of Cambridge, UK, with a group working 


@ Yoshiki Hotta: Director of the National Institute of Genetics in Japan, Hotta went on to a prominent career in Drosophila neural development. 
@ Douglas Kankel: Professor at Yale University investigating the neurogenetics of visual and nervous system development in Drosophila. 

@ Ronald Konopka: Continued to work on biological clocks at Clarkson University before retiring from science. 

@ William Quinn: Professor in the Department of Brain and Cognitive Sciences at Massachusetts Institute of Technology has continued his 


@ Obaid Siddiqi: founding Director of the TIFR National Centre for Biological Sciences at Bangalore and inducted in 2003 as a foreign member of 
the US National Academy of Sciences, Siddiqi pioneered the field of behaviour genetics of Drosophila olfaction after leaving the Benzer laboratory. 


neurons in a structure called the fan-shaped body”. Although the 
small size of Drosophila central-brain neurons has hindered electro- 
physiological access, recent work from Rachel Wilson and Gilles 
Laurent suggests that this barrier is not insurmountable”’, and a more 
detailed functional analysis of memory at the level of single mush- 
room-bodies seems likely. 

Ulrike Heberlein has turned the fly into a genetic model for alcohol 
intoxication, demonstrating that flies exhibit progressive and eerily 
human-like responses to acute alcohol exposure: first, they become 
hyperexcitable, then they lose coordination, and finally they pass 
out™. Some of the same cAMP pathway genes required for learning 
and memory affect a fly’s sensitivity to alcohol”. 

Both Edward Kravitz, who studies aggression in lobster, and Ralph 
Greenspan have recently turned their attention to fly aggression. Flies, 
both male and female, exhibit aggressive behaviours, with males fight- 
ing other males in the presence of a female and females jousting with 
females over food resources”. Kravitz has shown that fighting style 
differs between the sexes and is controlled by fru (ref. 57). Multi- 
generational selection for aggressive or docile strains has been achieved 
by Greenspan, and such strains have been analysed by whole-genome 
microarray to identify a large number of genes, the expression of 
which is modulated differentially in aggressive strains’. These genes 
will provide avenues for future investigation into the genetic basis of 
aggression. 

A behavioural paradigm recently pioneered by Roland Strauss is 
that of gap-crossing (Fig. 2). Flies are presented with gaps of varying 
widths, from narrow and easy to cross to unbreachable chasms, and 
make sophisticated estimations of which gaps can be reasonably 
crossed”’.This goal-directed climbing behaviour is useful to dissect 


motor planning and coordination, and to identify the circuits in the 
fly brain that estimate distance, but could, in principle, also lead to 
mutants with altered appetite for risk. It is conceivable that both risk- 
averse flies, capable of crossing a gap but choosing not to, and reckless 
flies, those choosing to cross impossibly wide gaps, could be iden- 
tified through genetic screens. 

Unlike the classic eusocial insects such as ants and bees, flies are 
not typically known for their group dynamics. This view has been 
changing somewhat on closer behavioural investigation, which 
has revealed some surprising evidence of social interactions in 
Drosophila. For instance, Joel Levine and Hall have shown that cir- 
cadian rhythms can be phase-shifted by the odour of flies living in 
another time zone or flies of another genotype. Hubert Amrein 
showed that normal circadian locomotor activity of a male is dras- 
tically affected by the presence of a female®'. These experiments hint 
at as yet unknown volatile chemical substances produced by other 
flies and detected by the olfactory system, and suggest that social 
interactions shape group interactions in the fly. In fact there is a 
growing trend to monitor Drosophila behaviour in more natural 
and enriched contextual environments that mimic those they might 
encounter in the real world. For instance, Levine has been observing 
fly social interactions in groups in the presence of food (Fig. 3). Free 
from the constraints of courting a single female in an austere Plexiglas 
chamber, as is the norm for observing courtship behaviour (Fig. 1c), 
Drosophila males in group situations seem to engage in complex 
group sex that combines foreplay, copulation and feeding behaviours 
(Fig. 3). It will be interesting to study the regulation and modulation 
of such group social behaviours and the importance of context in 
regulating them. 
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Figure 2 | Goal-oriented behaviour is measured in a gap-climbing 
paradigm, in which the fly estimates the width of the gap and judges if it 
seems feasible to cross. Photo by S. Pick, kindly provided by R. Strauss. 
Reprinted from ref. 59 with permission from Elsevier (copyright 2005). 


A related example of a behaviour that emerges in groups is the 
innate avoidance that flies show for an empty tube previously occu- 
pied by flies that experienced stress. Avoidance by naive flies of tubes 
previously occupied by shaken flies was first noticed by Benzer in 
1967 (ref. 1), and subsequently investigated by Greg Suh, working 
with the Benzer laboratory and the groups of David Anderson and 
Axel, as an innate olfactory avoidance of a Drosophila stress odor, 
dSO (ref. 62). This robust behaviour, resulting from the recognition 
and avoidance of the smell of a fellow fly in trouble, will be useful in 
future studies of the circuitry of anxiety, stress and innate fear. 


Concluding remarks 


Significant advances in our understanding of the biological clock, 
sensory systems, learning and memory, sexual courtship and many 
other behaviours have been made through neurogenetic research in 
Drosophila. With these successes behind us, some adventurous 
Drosophila neurogeneticists are moving beyond these original neu- 
rogenetics questions, which may in hindsight represent the low- 
hanging fruit—robust behaviours amenable to investigation in 
laboratory-based behavioural paradigms. It now seems possible to 
approach in the fly more complex behaviours and even emotions, the 
neurobiological basis of which are not well understood at the genetic 
or functional level in any animal: sociality, common sense, altruism, 
empathy, frustration, motivation, hatred, jealousy, peer pressure, 
and so on. The only a priori limitation to studying any of these traits 
is the belief that flies can show such emotions and the design of a 
plausible behavioural paradigm to measure them. 

This Progress article accompanies the release of complete genomes 
of eleven additional Drosophila species (D. ananassae, D. erecta, D. 
grimshawi, D. mojavensis, D. pseudoobscura, D. simulans, D. virilis, D. 
yakuba, D. persimilis, D. sechellia, D. willistoni), with vastly different 
ecologies and lifestyles to Drosophila melanogaster. What will be the 
impact of these additional Drosophila genomes on neurogenetics and 
behaviour research? Such information may begin to provide clues to 
differences in pheromonal communication and species recognition 
among these flies, some of which occupy overlapping ecological 
niches and need to pay careful attention to which species they are 
courting. A second area of interest will be in food preference and how 
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Figure 3 | The contextual courtship assay measures sexual behaviour 
under more naturalistic conditions in group situations on food. a, Three 
males courting a single virgin female near a wedge of food. b, Male with 
forelegs raised high copulates with a female, while another male, on his back, 
touches and licks her abdomen. This occurs on top of a wedge of food. Note 
that the female’s right foreleg stretches out across the surface of the food as 
does the left foreleg of the male beneath her. Such sexual behaviour is 
affected by the presence or absence of food in the assay. Gustatory receptors 
on the tarsi, the part of the foreleg in contact with the sweet food, are in a 
good position to sample food and may play a mechanistic role in this sexual 
interaction. Photo by N. Stepek and J.-C. Billeter, kindly provided by J. D. 
Levine, Univ. of Toronto, Mississauga. 


this might be influenced by the evolution of smell and taste receptors. 
Are there functional differences in chemosensory reception of a fly 
with omnivorous taste as compared to a fly species with more special- 
ized tastes? Hints that such phenomena are both existent and gen- 
etically tractable come from recent work in Bill Hansson’s group, 
which found that the Seychelles island species D. sechellia has an 
olfactory system specialized to sense its preferred food, the Noni 
fruit’. 

The little vinegar fly Drosophila melanogaster, along with its sister 
species, promises to reveal many more surprises about how the 
nervous system produces complex behaviours in the next 40 yr. 
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Imaging Drosophila gene activation and 
polymerase pausing in vivo 


John T. Lis* 


Since the early 1960s, imaging studies of Drosophila sp. polytene chromosomes have provided unique views of gene 
transcription in vivo. The dramatic changes in chromatin structure that accompany gene activation can be visualized as 
chromosome puffs. Now, live-cell imaging techniques coupled with protein-DNA crosslinking assays on a genome-wide 
scale allow more detailed mechanistic questions to be addressed and are prompting the re-evaluation of models of 


transcription regulation in both Drosophila and mammals. 


etermining the complete genome sequence of humans, 

Drosophila and other model higher eukaryotes was an 

important step in cataloguing the complete code that pro- 

grammes the development and maintenance of multicel- 
lular organisms. A critical challenge that remains is to determine in 
molecular terms how this code is read and regulated in higher eukar- 
yotes'. The regulation of transcription of the genome is a major mode 
by which an organism controls both its homeostasis and develop- 
ment. This regulation is executed mainly through the interactions of 
a plethora of transcription factor proteins and RNAs with each other 
and with DNA and associated histones*. These interactions then 
dictate when, where, and to what level specific genes are transcribed. 
Although biochemical and genetic approaches have identified many 
of the macromolecular players and their activities, a mechanistic 
understanding of how they operate in gene regulation can be guided 
and tested by direct imaging of these molecular interactions and the 
resulting biochemical processes in living cells. 

Two developments in the imaging of transcription factors at 
specific endogenous gene loci in vivo are providing new views of 
transcription mechanisms and regulation. One, currently specific 
to Drosophila, makes use of state-of-the-art optics combined with 
the natural amplification of signals in tissue containing polytene 
chromosomes, allowing investigation of molecular interactions and 
dynamics at specific loci in real time in living nuclei*. The other, 
although having a history in Drosophila‘, is species-general and uses 
crosslinking in vivo—that is, chromatin immunoprecipitation 
(ChIP) assays and variants thereof—to produce a ‘molecular image’ 
of specific protein interactions and chromatin modifications with 
particular DNA sequences in the genome’. Here I describe how these 
optical and molecular imaging approaches are providing new 
insights into transcription and the rate-limiting steps in its regulation 
in multicellular organisms. In particular, recent genome-wide ChIP 
assays in Drosophila*’? and mammals’ support a paradigm shift in 
gene regulation by indicating that control of transcription elongation 
by RNA polymerase II (PollII) in a promoter-proximal pausing 
model (Box 1), rather than control of the recruitment or initiation 
of Pol II, is the rate-limiting step for a large fraction of highly regu- 
lated genes. 


Early insights from spread polytene nuclei 


Over the past several decades, Drosophila has provided extraordinary 
views of chromosome structure and gene regulation. Simple phase 


microscopy of fixed and spread polytene nuclei provided early 
Drosophila geneticists with a high-resolution view of interphase chro- 
mosomes and a physical framework for their genetic maps’’. This 
allowed genes to be positioned relative to the characteristic band and 
interband patterns seen along the lengths of each chromosome 
(Fig. 1). Particularly striking are the changes in polytene chro- 
mosome structure that occur when genes become transcriptionally 
activated’*. High levels of transcription activation result in the 
decondensation of chromatin at specific genetic loci to form inter- 
bands and, in some cases, large distended chromosome puffs. These 
puffs provide cytological landmarks along each chromosome that 
come and go during the course of development, with each new set 
of puffs appearing, and old puffs disappearing, in response to waves 
of activation and repression of gene transcription (Fig. 1)". 

Environmental signals also activate the transcription of sets of 
genes, the activity of which can be observed as chromosomal puffs. 
The initial discovery in 1962 of the rapidly and highly activated heat 
shock response was not made by biochemists observing the induction 
of new messenger RNAs and proteins, but rather by a cytologist who 
was impressed by a striking new puffing pattern’’. These observations 
of dramatic changes in cytology (Fig. 2) foreshadowed the changes in 
the molecular composition and dynamics of genes that would be 
uncovered by many laboratories in the decades that followed’. 

The genes that underlie these heat-shock-activated chromosomal 
puffs form a frequently used model for investigating mechanisms of 
gene activation and the accompanying changes in chromatin and 
chromosome structure. The staining of polytene chromosomes with 
antibodies to specific transcription factors’* allowed these factors to 


Box 1| Pausing and stalling 


Here we use the term ‘paused RNA polymerase II (Pol II)’ to link to 
historical origins and because of properties in nuclear run-on assays” 
and relationships to some features of prokaryotic pausing that have 
been extensively characterized*°. However, the more general term of 
‘stalled’ is also very useful®?, especially because these populations of 
promoter-proximal Pol II will probably contain an equilibrium mix of 
both paused and arrested Pol |I??. Nuclear run-on assays are a key 
technique used to define the state of Pol Il. The assays are performed in 
isolated nuclei and allow elongationally competent Pol II to extend 
existing RNA chains with labelled nucleotide; this nascent RNA can be 
identified and quantified by hybridization to specific genomic 
sequences. 


2 
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be visualized at these readily inducible loci and, more globally, over 
the entire genome. The molecular composition and chromatin archi- 
tecture of these loci, and the changes triggered by the activation of 
these genes, have often proven to be general features of other 
developmentally activated loci’. 


Live-cell imaging during gene activation 


The ability to attach green fluorescent protein (GFP) and related 
fluorescent tags on chromosomal proteins has added a vital temporal 
dimension to the analysis of protein-DNA and protein-protein 
interactions in nuclei and on chromosomes. Expression of GFP- 
tagged proteins in mammalian cell lines allows the dynamics of tran- 
scription factors and chromatin components to be examined in 
nuclei in real time'®’’. Fluorescence recovery after photobleaching 
(FRAP) experiments additionally reveal the dynamics of a nuclear 
protein within an arbitrarily chosen nuclear region", or at a specific 
transgenic locus containing tandemly repeated genes'®. Examination 
of specific, native gene loci requires more sensitivity and higher- 
resolution views of interphase nuclei. The giant interphase nuclei 
of Drosophila polytene tissue provide both the sensitivity and effec- 
tive resolution to detect signals from specific chromosomal loci’. 
The advent of new optical-sectioning capabilities added further 
power to the imaging of intact nuclei. Computational deconvolution 
methods were used initially with wide-field microscopy to optically 
section Drosophila polytene nuclei’. Confocal, spinning disk and 
multi-photon microscopy further enhanced sectioning capabilities, 
with multi-photon microscopy being particularly effective in min- 
imizing photodamage and providing high effective-resolution of 
thick biological specimens’. By applying these approaches to 
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Figure 1| Puffing patterns on chromosome 3L during Drosophila larval 
development. The chromosomes in a—e were isolated from progressively 
later developmental stages of third instar larvae. The light-staining 
distended puffs represent regions of high transcription activity. As 
developmental genes are turned on and off, the puffs appear and disappear. 
Reproduced with kind permission of Springer Science and Business Media 
from ref. 12. 
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Figure 2 | Heat-shock-induced puffing at major heat shock loci 87A and C. 
Displayed is a small segment of fixed chromosome 3 before (top) and after 
(bottom) heat shock. Chromosomes are stained for DNA (Hoeschst dye; 
blue) and for Pol II (green)”’. HS, heat shock. 


Drosophila polytene nuclei, it is possible to observe protein interac- 
tions with specific endogenous genetic loci in real time (Fig. 3). This 
was invaluable in directly tracking heat shock transcriptional activ- 
ator (HSF) movement from nucleoplasm to specific heat shock gene 
loci (Hsp70), and for demonstrating that activated HSF remains 
bound in a non-exchanging state for many dozens of cycles of tran- 
scription’. These optical strategies should prove to be critical in 
exploring challenging mechanistic questions concerning transcrip- 
tion regulation, as discussed later. 


Molecular imaging of proteins on genes in vivo 
As a complement to the microscopic views of nuclear protein dis- 
tributions on polytene chromosomes, molecular approaches using 
protein-DNA crosslinking and immunoprecipitation of protein— 
DNA complexes (ultraviolet (UV)-ChIP and formaldehyde-ChIP) 
were developed in the mid-1980s to provide higher-resolution 
images of the molecular architecture of proteins and genes in vivo*°. 
Applying these methods both before and during the time course of 
gene activation identified the timing and location of numerous pro- 
tein interactions at high resolution (~200 base pairs). The imme- 
diate, synchronous and robust heat shock response allows the heat 
shock genes to be examined before, and in the seconds that follow, 
stimulation, such that changes in chromatin and factor recruitment 
can be followed as a wave that passes along the length of these tran- 
scription units’. Additionally, the role of particular factors in these 
processes can be readily assessed using existing mutants or inhibitory 
drugs or by RNA interference knock-down strategies’. 

An early and surprising finding, seen initially by UV-ChIP”', was 
that Drosophila heat shock genes had a polymerase associated with 


Figure 3 | Two-photon image of living salivary gland nuclei. DAPI (4,6- 
diamidino-2-phenylindole)-stained DNA of a single nuclear section (a), a 
three-dimensional reconstructed nucleus (b), and GFP—Pol II at heat shock 
puffs (¢; Pol II, green)’. 
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their promoters before activation, and that this associated Pol II was 
engaged in transcription and competent for elongation as deter- 
mined by nuclear run-on assays (Box 1)”. These findings seemed 
to contradict the widely held view that the recruitment of Pol II or 
the initiation of transcription was the rate-limiting step in gene 
activation. Additional support for the existence and characterization 
of the nature of this PolII complex came from the mapping of 
promoter melting with KMnO, (ref. 23), which revealed melted 
DNA in the regions of heat shock genes residing 20-50 base pairs 
downstream of the transcription start sites. Also, the isolation and 
sizing of the chain-terminated run-on transcripts™* provided near- 
nucleotide-resolution mapping of pause sites to this same region, as 
two peaks separated by 10 base pairs. Interestingly, the RNAs are 
progressively more 5’-capped as they progress through the pause 
region, suggesting that capping occurs as soon as the RNA emerges 
from Pol II. 

Two protein complexes—DRB sensitivity-inducing factor (DSIF; 
made up of spt4 and Spt5) and negative elongation factor (NELF; 
made up of five subunits)—were found to cooperate to repress tran- 
scription elongation in vitro’, and their negative effects could be 
overcome by P-TEFb (made up of Cdk9 and CycT) kinase’’. Using 
ChIP assays, both DSIF and NELF were found to be located along 
with paused Pol II in the promoter-proximal regions of uninduced 
heat shock genes (Fig. 4)*”’. The P-TEFb kinase is recruited to active 
genes where it overcomes the negative effects through its kinase 
activity, which can phosphorylate DSIF, NELF and Ser 2 residues of 
the carboxy-terminal domain of the largest subunit of Pol II*”*. 
Interestingly, P-TEFb recruitment to heat shock genes depends on 
the heat shock activator HSF, but HSF does not seem to bind directly 
to P-TEFb’*. Some, but not all, activators have been shown to interact 
with P-TEFb, so other mechanisms of P-TEFb recruitment need to be 
considered*®. The transition into productive elongation seems to 
correlate with the loss of NELF from the promoter’’. In contrast, 
DSIF remains associated with productively elongating Pol II and is 
thought to have a positive role after escape from the pause”’*. Paused 
polymerases are susceptible to backtracking, and the presence of 
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Figure 4 | Minimal model of Pol II pausing at Hsp70 genes before and after 
heat shock. For a more complete description of factors associated with heat 
shock genes see the more comprehensive review’. a, Prior to heat-shock, 
paused Pol II, which is partially phosphorylated at Ser 5 residues (green P) of 
the carboxy-terminal domain, is in a complex with DSIF and NELF 
complexes and occupies a region between 20-40 base pairs downstream of 
the start site of Hsp70. GAF is a sequence-specific binding factor that is 
present before activation. b, HSF is the key activator protein that trimerizes 
and binds with high affinity to its DNA elements in response to heat shock. 
Both DSIF and TFIIS (IIS) are part of both the paused and the fully 
competent Pol II elongation complexes. P-TEFb is the kinase that is critical 
for the maturation of paused Pol II into a productive elongation product, 
and it phosphorylates DSIF, NELF and the Ser 2 residues (blue P) of the 
carboxy-terminal domain of Pol II’*. 
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elongation factor TFHS at the pause region in vivo in Drosophila 
may stimulate the intrinsic RNA cleavage activity of Pol II to create 
a new RNA 3’ end in the active site, and thereby maintain a popu- 
lation of elongationally competent complexes”. 


Defining a Pol Il as ‘promoter paused’ 


The presence of a peak of Pol II at the promoter is not sufficient to 
identify a Pol II as paused. Although such a distribution is indicative 
of Pol II promoter recruitment not being rate limiting in transcrip- 
tion, this Pol II could be in a pre-initiation complex or at some other 
post-recruitment step that precedes pausing. Additional criteria need 
to be applied*®. Nuclear run-on assays can demonstrate that the 
detected Pol II is trancriptionally engaged, particularly if performed 
in the presence of Sarkosyl or high salt, which blocks new initiation 
and seems to remove barriers to elongation”. The isolation and sizing 
of intentionally terminated run-on RNAs show, at near-nucleotide- 
resolution, the location of the paused RNAs in vivo, and have the 
additional benefit of determining the capping signature, which in 
the cases examined is absent from the earliest pause sites, but is 
nearly complete by position +30 (ref. 24). Promoter-melting assays 
using KMnO, can be performed on intact cells for short periods 
(30s) to provide a snapshot of the reactivity of T residues, the 
hyper-reactivity of which is indicative of PollI-melted DNA”. 
Whereas these melted residues tend to cluster in the region of 
20-50 base pairs and highly correlate with paused Pol II, the pattern 
of reactivity can be influenced by other proteins either protecting or 
altering the reactivity of T residues. Additional signatures can be 
detected by ChIP assays and include the presence of NELF, and 
Ser 5- but not Ser2-phosphorylated carboxy-terminal domain of 
the largest subunit (RpII215) of Pol II’”’. 


How general is paused Pol II? 


PolII paused on promoters was regarded initially as a feature of 
the heat shock genes and a few other rapidly responsive genes. A 
small set of randomly chosen Drosophila genes also seemed to 
have paused PollIl, as assessed by UV-ChIP and nuclear run-on 
assays*', though not at the full occupancy (1 PolII per promoter) 
seen for Hsp70 (ref. 30). Additionally, evidence supporting some 
form of elongational control in specific vertebrate genes has existed 
since the early 1980s, for example, greater nuclear run-on signals 
from 5’ portions than 3’ regions of chicken B-globin*’. Similar 
data for mammalian c-Myc and c-Fos were augmented with in vitro 
studies that suggested initially a termination control further down- 
stream, but more thorough characterizations in vivo showed these 
genes to have properties like those of Drosophila Hsp70 paused 
Pol IP?**. 

More recently, genome-wide ChIP analyses of Pol II in human cells 
showed PollII peaks in the promoter regions of a large fraction of 
genes*’, and a study in human stem cells and differentiated cells 
showed that the majority of genes have peaks of Pol II that seem to 
have undergone transcription initiation, on the basis of their pattern 
of histone H3K4 trimethylation (a modification linked to active pro- 
moters) and the production of short 5’ transcripts’®. Interestingly, 
many of these genes are not producing mature transcripts’’°. Very 
recent genome-wide studies in Drosophila have revealed that many 
genes have peaks of paused Poll, including important regulatory 
genes of early Drosophila development*’. Numbers of promoters 
with paused Pol II in Drosophila, as defined by exhibiting additional 
signatures of paused PolII such as NELF occupancy and promoter 
melting*”, are estimated conservatively to be about 20% of all genes 
that have any associated Pol II. 

Is this mechanism of elongation control used only for the esti- 
mated 20% of genes in Drosophila’*, and >50% (estimated by dif- 
ferent criteria) in human stem cells, or is pausing a still more 
universal step on the pathway of transcription of most genes? The 
inhibition of P-TEFb kinase, (a kinase which is critical for Pol II 
escape from pausing into productive elongation), can lead to an 
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80% reduction of PollII transcription’, indicating that these steps 
may be pervasive, though not necessarily rate limiting, under the 
particular cellular conditions examined. 


Questions and future prospects 


What are the proteins and RNAs that set up and maintain paused 
Pol II, and what is the mechanism used to achieve this? In Drosophila, 
GAGA factor (GAF; also known as Trl) is bound before heat shock 
induction and seems to be critical for setting up the pause on Hsp70 
(refs 37 and 38). However, many, but not all, genes showing paused 
Pol II have associated GAF, and GAF homologues are not obvious in 
mammals. Is there a class of transcription factors with GAF-like 
function, and how do they function mechanistically? Although 
NELF and DSIF are major players, their depletion leads to neither 
a complete loss of paused Pol II nor high constitutive expression. Do 
other factors, and perhaps inherent features of Pol II and the under- 
lying DNA or RNA sequence, also participate in this process? Can we 
devise other genome-wide approaches that allow characterization of 
paused PolII RNAs at nucleotide-resolution and thereby improve 
the search for global sequence elements? 

How dynamic is the promoter-proximal paused Pol II? The paused 
Pol II has been described by some as performing abortive transcrip- 
tion, meaning that Pol II undergoes termination before completing a 
full-length transcript. An extreme version of this view posits that the 
paused PollII is terminated and replaced with productive Pol II fol- 
lowing gene activation. However, there is no evidence in vivo that 
abortive transcription is a necessary property of these paused Pol II 
molecules. Clearly, there are cases in which Pol II does abort tran- 
scription early in the process of transcription in eukaryotes, for 
example, HIV transcription. However, these short HIV transcripts 
are terminated downstream of the HIV pause region and these ter- 
mination events are dictated by mechanisms distinct from pausing”. 
In the case of Drosophila Hsp70, short transcripts do not accumulate, 
although some fraction of transcripts fail to efficiently elongate in 
run-on assays and are likely to be back-tracked and arrested. 
However, TFIS is also present at the promoter and may allow these 
arrested Pol II complexes to be in dynamic equilibrium with elonga- 
tionally competent paused complexes””’. For some genes, Pol II 
shows only fractional promoter occupancy—less than one Pol II 
per promoter. Does that reflect only a fraction of the cells having a 
poised promoter and thus an active or activatable gene? Or does this 
variation in occupancy reflect differences in the relative rates with 
which Pol II enters and escapes from the pause region? Both of these 
steps may be influenced by the spectrum of regulatory proteins 
bound to enhancer and promoter regions. These difficult questions 
concerning molecular dynamics at native gene loci should be addres- 
sable with the improved optical approaches discussed above, and 
with carefully developed in vitro systems that recapitulate all of the 
in vivo landmarks of these Pol II complexes. 

How do activator proteins interface with paused Pol II to influence 
its escape to productive elongation? Clearly the P-TEFb kinase is a 
critical executer of the activation signal, and its presence and ability 
to phosphorylate components of the paused Pol II complex are crit- 
ical for stimulating productive elongation”’. In some cases, activators 
are known to interact with P-TEFb, at least in simple pull-down or 
co-immunoprecipitation assays”°; however, in other cases, activators 
show no detectable affinity for P-TEFb’’, and other existing mechan- 
isms”*, or novel mechanisms yet to be discovered, allow activators to 
communicate with P-TEFb. 

In summary, the pervasiveness of paused Pol II is causing the re- 
evaluation of the long-held textbook view of transcription and its 
regulation. For decades, the major mechanism of gene regulation in 
higher eukaryotes was thought to reside at the level of either recruit- 
ment of PolII to promoters or transcription initiation. In light of 
genome-wide data’'®, a post-recruitment and early elongation 
mechanism needs to be considered as a major mode of regulation 
in higher eukaryotes. Drosophila has been a useful model system for 
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Evolution of genes and genomes on the 
Drosophila phylogeny 


Drosophila 12 Genomes Consortium* 


Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity 
of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 
Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, 
willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can 
illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that 
have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research 
on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite 
remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in 
protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the 


ecology and behaviour of these diverse species. 


As one might expect from a genus with species living in deserts, in the 
tropics, on chains of volcanic islands and, often, commensally with 
humans, Drosophila species vary considerably in their morphology, 
ecology and behaviour’. Species in this genus span a wide range of 
global distributions: the 12 sequenced species originate from Africa, 
Asia, the Americas and the Pacific Islands, and also include cos- 
mopolitan species that have colonized the planet (D. melanogaster 
and D. simulans) as well as closely related species that live on single 
islands (D. sechellia)?. A variety of behavioural strategies is also 
encompassed by the sequenced species, ranging in feeding habit from 
generalist, such as D. ananassae, to specialist, such as D. sechellia, 
which feeds on the fruit of a single plant species. 

Despite this wealth of phenotypic diversity, Drosophila species 
share a distinctive body plan and life cycle. Although only D. mela- 
nogaster has been extensively characterized, it seems that the most 
important aspects of the cellular, molecular and developmental bio- 
logy of these species are well conserved. Thus, in addition to provid- 
ing an extensive resource for the study of the relationship between 
sequence and phenotypic diversity, the genomes of these species 
provide an excellent model for studying how conserved functions 
are maintained in the face of sequence divergence. These genome 
sequences provide an unprecedented dataset to contrast genome 
structure, genome content, and evolutionary dynamics across the 
well-defined phylogeny of the sequenced species (Fig. 1). 


Genome assembly, annotation and alignment 

Genome sequencing and assembly. We used the previously pub- 
lished sequence and updated assemblies for two Drosophila species, 
D. melanogaster (release 4) and D. pseudoobscura? (release 2), and 
generated DNA sequence data for 10 additional Drosophila genomes 
by whole-genome shotgun sequencing®’. These species were chosen 
to span a wide variety of evolutionary distances, from closely related 
pairs such as D. sechellia/D. simulans and D. persimilis/D. pseudoobs- 
cura to the distantly related species of the Drosophila and Sophophora 
subgenera. Whereas the time to the most recent common ancestor of 
the sequenced species may seem small on an evolutionary timescale, 
the evolutionary divergence spanned by the genus Drosophila exceeds 


that of the entire mammalian radiation when generation time is 
taken into account, as discussed further in ref. 8. We sequenced seven 
of the new species (D. yakuba, D. erecta, D. ananassae, D. willistoni, 
D. virilis, D. mojavensis and D. grimshawi) to deep coverage (8.4 to 
11.0X) to produce high quality draft sequences. We sequenced two 
species, D. sechellia and D. persimilis, to intermediate coverage 
(4.9X and 4.1, respectively) under the assumption that the avail- 
ability of a sister species sequenced to high coverage would obviate 
the need for deep sequencing without sacrificing draft genome qual- 
ity. Finally, seven inbred strains of D. simulans were sequenced to low 
coverage (2.9 coverage from w’”’ and ~1X coverage of six other 
strains) to provide population variation data’. Further details of the 
sequencing strategy can be found in Table 1, Supplementary Table 1 
and section 1 in Supplementary Information. 

We generated an initial draft assembly for each species using one of 
three different whole-genome shotgun assembly programs (Table 1). 
For D. ananassae, D. erecta, D. grimshawi, D. mojavensis, D. virilisand 
D. willistoni, we also generated secondary assemblies; reconciliation 
of these with the primary assemblies resulted in a 7—30% decrease in 
the estimated number of misassembled regions and a 12-23% 
increase in the N50 contig size® (Supplementary Table 2). For 
D. yakuba, we generated 52,000 targeted reads across low-quality 
regions and gaps to improve the assembly. This doubled the mean 
contig and scaffold sizes and increased the total fraction of high 
quality bases (quality score (Q) > 40) from 96.5% to 98.5%. We 
improved the initial 2.9 D. simulans w*”' whole-genome shotgun 
assembly by filling assembly gaps with contigs and unplaced reads 
from the ~1X assemblies of the six other D. simulans strains, gene- 
rating a ‘mosaic’ assembly (Supplementary Table 3). This integration 
markedly improved the D. simulans assembly: the N50 contig size of 
the mosaic assembly, for instance, is more than twice that of the 
initial we”! assembly (17 kb versus 7 kb). 

Finally, one advantage of sequencing genomes of multiple closely 
related species is that these evolutionary relationships can be 
exploited to dramatically improve assemblies. D. yakuba and 
D. simulans contigs and scaffolds were ordered and oriented using 
pairwise alignment to the well-validated D. melanogaster genome 


*A list of participants and affiliations appears at the end of the paper. 
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Figure 1| Phylogram of the 12 sequenced species of Drosophila. Phylogram 
derived using pairwise genomic mutation distances and the neighbour- 
joining method’**"**. Numbers below nodes indicate the per cent of genes 
supporting a given relationship, based on evolutionary distances estimated 
from fourfold-degenerate sites (left of solidus) and second codon positions 
(right of solidus). Coloured blocks indicate support from bayesian 


sequence (Supplementary Information section2). Likewise, the 
4—5X D. persimilis and D. sechellia assemblies were improved by 
assisted assembly using the sister species (D. pseudoobscura and 
D. simulans, respectively) to validate both alignments between 
reads and linkage information. For the remaining species, com- 
parative syntenic information, and in some cases linkage informa- 
tion, were also used to pinpoint locations of probable genome mis- 
assembly, to assign assembly scaffolds to chromosome arms and to 
infer their order and orientation along euchromatic chromosome 
arms, supplementing experimental analysis based on known 
markers (A. Bhutkar, S. Russo, S. Schaeffer, T. F. Smith and W. M. 
Gelbart, personal communication) (Supplementary Information 
section 2). 

The mitochondrial (mt)DNA of D. melanogaster, D. sechellia, 
D. simulans (sill), D. mauritiana (mall) and D. yakuba have been 
previously sequenced'"””. For the remaining species (except D. pseu- 
doobscura, the DNA from which was prepared from embryonic 
nuclei), we were able to assemble full mitochondrial genomes, 
excluding the A+T-rich control region (Supplementary Informa- 
tion section2)'*. In addition, the genome sequences of three 
Wolbachia endosymbionts (Wolbachia wSim, Wolbachia wAna and 
Wolbachia wWil) were assembled from trace archives, in D. simulans, 
D. ananassae and D. willistoni, respectively'*. All of the genome 
sequences described here are available in FlyBase (www. flybase.org) 
and GenBank (www.ncbi.nlm.nih.gov) (Supplementary Tables 4 
and 5). 

Repeat and transposable element annotation. Repetitive DNA 
sequences such as transposable elements pose challenges for 


(posterior probability (PP), upper blocks) and maximum parsimony (MP; 
bootstrap values, lower blocks) analyses of data partitioned by chromosome 
arm. Branch lengths indicate the number of mutations per site (at fourfold- 
degenerate sites) using the ordinary least squares method. See ref. 154 for a 
discussion of the uncertainties in the D. yakuba/D. erecta clade. 


whole-genome shotgun assembly and annotation. Because the best 
approach to transposable element discovery and identification is still 
an active and unresolved research question, we used several repeat 
libraries and computational strategies to estimate the transposable 
element/repeat content of the 12 Drosophila genome assemblies 
(Supplementary Information section3). Previously curated trans- 
posable element libraries in D. melanogaster provided the starting 
point for our analysis; to limit the effects of ascertainment bias, we 
also developed de novo repeat libraries using PILER-DF'*'® and 
ReAS'’. We used four transposable element/repeat detection meth- 
ods (RepeatMasker, BLASTER-TX, RepeatRunner and CompTE) in 
conjunction with these transposable element libraries to identify 
repetitive elements in non-melanogaster species. We assessed the 
accuracy of each method by calibration with the estimated 5.5% 
transposable element content in the D. melanogaster genome, which 
is based on a high-resolution transposable element annotation’® 
(Supplementary Fig. 1). On the basis of our results, we suggest a 
hybrid strategy for new genome sequences, employing translated 
BLAST with general transposable element libraries and 
RepeatMasker with species-specific ReAS libraries to estimate the 
upper and lower bound on transposable element content. 

Protein-coding gene annotation. We annotated protein-coding 
sequences in the 11 non-melanogaster genomes, using four different 
de novo gene predictors (GeneID, SNAP”, N-SCAN?! and 
CONTRAST”); three homology-based predictors that transfer 
annotations from D. melanogaster (GeneWise”’, Exonerate™, 
GeneMapper”); and one predictor that combined de novo and 
homology-based evidence (Gnomon”’). These gene prediction sets 


Table 1| A summary of sequencing and assembly properties of each new genome 


Final assembly Genome centre Q20 coverage (X) | Assembly size (Mb) 


Per cent of base pairs with quality >Q40 


D. simulans WUGSC* 29 37.8 
D. sechellia Broadt 49 66.6 
D. yakuba WUGSC* 9.1 65.7 
D. erecta Agencourtt 10.6 52.7 
D. ananassae Agencourtt 8.9 231.0 
D. persimilis Broadt Al 88.4 
Dz willistoni JICVIE 8.4 235.5 
D. virilis Agencourtt 8.0 206.0 
D. mojavensis Agencourtt 8.2 93.8 
D. grimshawi Agencourtt 7.9 200.5 


No. of contigs =2 kb 


N50 contig =2 kb (kb) 


10,843 17 90.3 
9,713 43 90.6 
6,344 125 98.5 
3,283 458 99.2 
8,155 113 98.5 

14,547 20 93.3 
6,652 197 97.4 
5,327 136 98.7 
5,734 132 98.6 
9,632 114 97.1 


Contigs, contiguous sequences not interrupted by gaps; N50, the largest length L such that 50% of all nucleotides are contained in contigs of size =L. The Q20 coverage of contigs is based on the 
number of assembled reads, average Q20 readlength and the assembled size excluding gaps. Assemblers used: *PCAP6, fARACHNE4.5 and {Celera Assembler 7. 
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Table 2 | A summary of annotated features across all 12 genomes 


ARTICLES 


Protein-coding gene annotations 


Non-coding RNA annotations Repeat coverage Genome size (Mb; 


(%)* assembly+/flow 
Total no. of protein- coding | Coding sequence/ tRNA (pseudo) snoRNA miRNA rRNA snRNA cytometry) 
genes (per cent with D. intron (Mb) (5.88 + 5S) 

melanogaster homologue) 
D. melanogaster 13,733 (100%) 38.9/21.8 297 (4) 250 78 101 28 5.35 118/200 
D. simulans 15,983 (80.0%) 45.8/19.6 268 (2) 246 70 72 32 2.73 111/162 
D. sechellia 16,884 (81.2%) 47.9/21.9 312 (13) 242 78 133 30 3.67 115/171 
D. yakuba 16,423 (82.5%) 50.8/22.9 380 (52) 255 80 55) 37 12.04 127/190 
D. erecta 15,324 (86.4%) 49.1 /22.0 286 (2) 252 81 101 38 6.97 134/135 
D. ananassae 15,276 (83.0%) 57.3/22.3 472 (165) 194 76 134 29 24.93 176/217 
D. pseudoobscura 16,363 (78.2%) 49.7/24.0 295 (1) 203 73 55 31 2.76 127/193 
D. persimilis 17,325 (72.6%) 54.0/21.9 306 (1) 199 75 80 31 8.47 138/193 
Dz willistoni 15,816 (78.8%) 65.4/23.5 484 (164) 216 77 76 37 15.57 187/222 
D. virilis 14,680 (82.7%) 57.9/21.7 279 (2) 165 74 294 81 13.96 172/364 
D. mojavensis 14,849 (80.8%) 57.8/21.9 267 (3) 139 71 74 30 8.92 161/130 
D. grimshawi 15,270 (81.3%) 54.9/22.5 261 (1) 154 82 70 32 2.84 138/231 
* Repeat coverage calculated as the fraction of scaffolds >200 kb covered by repeats, estimated as the midpoint between BLASTER-tx + PILER and RepeatMasker + ReAS (Supplementary 
Information section 3). +Total genome size estimated as the sum of base pairs in genomic scaffold >200,000 bp. t{Genome size estimates based on flow cytometry. 


were combined using GLEAN, a gene model combiner that chooses 
the most probable combination of start, stop, donor and acceptor 
sites from the input predictions”””*. All analyses reported here, unless 
otherwise noted, relied on a reconciled consensus set of predicted 
gene models—the GLEAN-R set (Table 2, and Supplementary 
Information section 4.1). 

Quality of gene models. As the first step in assessing the quality of the 
GLEAN-R gene models, we used expression data from microarray 
experiments on adult flies, with arrays custom-designed for D. simu- 
lans, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis and 
D. mojavensis” (GEO series GSE6640; Supplementary Information 
section 4.2). We detected expression significantly above negative 
controls (false-discovery-rate-corrected Mann—Whitney U (MWU) 
P<0.001) for 77-93% of assayed GLEAN-R models, representing 
50-68% of the total GLEAN-R predictions in each species (Supple- 
mentary Table 6). Evolutionarily conserved gene models are much 
more likely to be expressed than lineage-specific ones (Fig. 2). 
Although these data cannot confirm the detailed structure of gene 
models, they do suggest that the majority of GLEAN-R models 
contain sequence that is part of a poly-adenylated transcript. 
Approximately 20% of transcription in D. melanogaster seems to 
be unassociated with protein-coding genes*’, and our microarray 
experiments fail to detect conditionally expressed genes. Thus, 
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transcript abundance cannot conclusively establish the presence or 
absence of a protein-coding gene. Nonetheless, we believe these 
expression data increase our confidence in the reliability of the 
GLEAN-R models, particularly those supported by homology evid- 
ence (Fig. 2). 

Because the GLEAN-R gene models were built using assemblies 
that were not repeat masked, it is likely that some proportion of gene 
models are false positives corresponding to coding sequences of 
transposable elements. We used RepeatMasker with de novo ReAS 
libraries and PFAM structural annotations of the GLEAN-R gene set 
to flag potentially transposable element-contaminated gene models 
(Supplementary Information section 4.2). These procedures suggest 
that 5.6-32.3% of gene models in non-melanogaster species corre- 
spond to protein-coding content derived from transposable elements 
(Supplementary Table 7); these transposable element-contaminated 
gene models are almost exclusively confined to gene predictions 
without strong homology support (Fig. 2). Transposable element- 
contaminated gene models are excluded from the final gene predic- 
tion set used for subsequent analysis, unless otherwise noted. 
Homology assignment. Two independent approaches were used to 
assign orthology and paralogy relationships among euchromatic 
D. melanogaster gene models and GLEAN-R predictions. The first 
approach was a fuzzy reciprocal BLAST (FRB) algorithm, which is an 
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Figure 2 | Gene models in 12 Drosophila genomes. Number of gene models 
that fall into one of five homology classes: single-copy orthologues in all 

species (single-copy orthologues), conserved in all species as orthologues or 
paralogues (conserved homologues), a D. melanogaster homologue, but not 
found in all species (patchy homologues with mel.), conserved in at least two 
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species but without a D. melanogaster homologue (patchy homologues, no 
mel.), and found only in a single lineage (lineage specific). For those species 
with expression data”’, pie charts indicate the fraction of genes in each 

homology class that fall into one of four evidence classes (see text for details). 
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extension of the reciprocal BLAST method” applicable to multiple 
species simultaneously (Supplementary Information section 5.1). 
Because the FRB algorithm does not integrate syntenic information, 
we also used a second approach based on Synpipe (Supplementary 
Information section 5.2), a tool for synteny-aided orthology assign- 
ment”. To generate a reconciled set of homology calls, pairwise 
Synpipe calls (between each species and D. melanogaster) were 
mapped to GLEAN-R models, filtered to retain only 1:1 relation- 
ships, and added to the FRB calls when they did not conflict and 
were non-redundant. This reconciled FRB + Synpipe set of homo- 
logy calls forms the basis of our subsequent analyses. There were 
8,563 genes with single-copy orthologues in the melanogaster group 
and 6,698 genes with single-copy orthologues in all 12 species; similar 
numbers of genes were also obtained with an independent 
approach*’. Most single-copy orthologues are expressed and are free 
from potential transposable element contamination, suggesting that 
the reconciled orthologue set contains robust and high-quality gene 
models (Fig. 2). 

Validation of homology calls. Because both the FRB algorithm and 
Synpipe rely on BLAST-based methods to infer similarities, rapidly 
evolving genes may be overlooked. Moreover, assembly gaps and 
poor-quality sequence may lead to erroneous inferences of gene 
loss. To validate putative gene absences, we used a synteny-based 
GeneWise pipeline to find potentially missed homologues of D. mel- 
anogaster proteins (Supplementary Information section 5.4). Of the 
21,928 cases in which a D. melanogaster gene was absent from another 
species in the initial homology call set, we identified plausible homo- 
logues for 13,265 (60.5%), confirmed 4,546 (20.7%) as genuine 
absences, and were unable to resolve 4,117 (18.8%). Because this 
approach is conservative and only confirms strongly supported 
absences, we are probably underestimating the number of genuine 
absences. 

Coding gene alignment and filtering. Investigating the molecular 
evolution of orthologous and paralogous genes requires accurate 
multi-species alignments. Initial amino acid alignments were gener- 
ated using TCOFFEE™ and converted to nucleotide alignments 
(Supplementary Table 8). To reduce biases in downstream analyses, 
a simple computational screen was developed to identify and mask 
problematic regions of each alignment (Supplementary Information 
section 6). Overall, 2.8% of bases were masked in the melanogaster 
group alignments, and 3.0% of bases were masked in the full 12 
species alignments, representing 8.5% and 13.8% of alignment col- 
umns, respectively. The vast majority of masked bases are masked in 
no more than one species (Supplementary Fig. 3), suggesting that the 
masking procedure is not simply eliminating rapidly evolving regions 
of the genome. We find an appreciably higher frequency of masked 
bases in lower-quality D. simulans and D. sechellia assemblies, com- 
pared to the more divergent (from D. melanogaster) but higher- 
quality D. erecta and D. yakuba assemblies, suggesting a higher error 
rate in accurately predicting and aligning gene models in lower- 
quality assemblies (Supplementary Information section6 and 
Supplementary Fig. 3). We used masked versions of the alignments, 
including only the longest D. melanogaster transcripts for all sub- 
sequent analysis unless otherwise noted. 

Annotation of non-coding (nc)RNA genes. Using de novo and 
homology-based approaches we annotated over 9,000 ncRNA genes 
from recognized ncRNA classes (Table 2, and Supplementary 
Information section 7). In contrast to the large number of predictions 
observed for many ncRNA families in vertebrates (due in part to large 
numbers of ncRNA pseudogenes****), the number of ncRNA genes 
per family predicted by RFAM and tRNAscan in Drosophila is rela- 
tively low (Table 2). This suggests that ncRNA pseudogenes are 
largely absent from Drosophila genomes, which is consistent with 
the low number of protein-coding pseudogenes in Drosophila’. 
The relatively low numbers of some classes of ncRNA genes (for 
example, small nucleolar (sno)RNAs) in the Drosophila subgenus 
are likely to be an artefact of rapid rates of evolution in these types 
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of genes and the limitation of the homology-based methods used to 
annotate distantly related species. 


Evolution of genome structure 

Coarse-level similarities among Drosophilids. At a coarse level, 
genome structure is well conserved across the 12 sequenced species. 
Total genome size estimated by flow cytometry varies less than three- 
fold across the phylogeny, ranging from 130 Mb (D. mojavensis) to 
364 Mb (D. virilis)** (Table 2), in contrast to the order of magnitude 
difference between Drosophila and mammals. Total protein-coding 
sequence ranges from 38.9Mb in D. melanogaster to 65.4 Mb in 
D. willistoni. Intronic DNA content is also largely conserved, ranging 
from 19.6Mb in D. simulans to 24.0 Mb in D. pseudoobscura 
(Table 2). This contrasts dramatically with transposable element- 
derived genomic DNA content, which varies considerably across 
genomes (Table 2) and correlates significantly with euchromatic 
genome size (estimated as the summed length of contigs > 200 kb) 
(Kendall’s t = 0.70, P= 0.0016). 

To investigate overall conservation of genome architecture at an 
intermediate scale, we analysed synteny relationships across species 
using Synpipe” (Supplementary Information section 9.1). Synteny 
block size and average number of genes per block varies across the 
phylogeny as expected, with the number of blocks increasing and the 
average size of blocks decreasing with increasing evolutionary dis- 
tance from D. melanogaster (A. Bhutkar, S. Russo, T. F. Smith and W. 
M. Gelbart, personal communication) (Supplementary Fig. 4). We 
inferred 112 syntenic blocks between D. melanogaster and D. sechellia 
(with an average of 122 genes per block), compared to 1,406 syntenic 
blocks between D. melanogaster and D. grimshawi (with an average of 
8 genes per block). On average, 66% of each genome assembly was 
covered by syntenic blocks, ranging from 68% in D. sechellia to 58% 
in D. grimshawi. 

Similarity across genomes is largely recapitulated at the level of 
individual genes, with roughly comparable numbers of predicted 
protein-coding genes across the 12 species (Table 2). The majority 
of predicted genes in each species have homologues in D. melanoga- 
ster (Table 2, Supplementary Table 9). Moreover, most of the 13,733 
protein-coding genes in D. melanogaster are conserved across the 
entire phylogeny: 77% have identifiable homologues in all 12 gen- 
omes, 62% can be identified as single-copy orthologues in the six 
genomes of the melanogaster group and 49% can be identified as 
single-copy orthologues in all 12 genomes. The number of functional 
non-coding RNA genes predicted in each Drosophila genome is 
also largely conserved, ranging from 584 in D. mojavensis to 908 in 
D. ananassae (Table 2). 

There are several possible explanations for the observed interspe- 
cific variation in gene content. First, approximately 700 D. melano- 
gaster gene models have been newly annotated since the FlyBase 
Release 4.3 annotations used in the current study, reducing the dis- 
crepancy between D. melanogaster and the other sequenced genomes 
in this study. Second, because low-coverage genomes tend to have 
more predicted gene models, we suspect that artefactual duplication 
of genomic segments due to assembly errors inflates the number of 
predicted genes in some species. Finally, the non-melanogaster spe- 
cies have many more predicted lineage-specific genes than D. mela- 
nogaster, and it is possible that some of these are artefactual. In the 
absence of experimental evidence, it is difficult to distinguish genuine 
lineage-specific genes from putative artefacts. Future experimental 
work will be required to fully disentangle the causes of interspecific 
variation in gene number. 

Abundant genome rearrangements during Drosophila evolution. 
To study the structural relationships among genomes on a finer 
scale, we analysed gene-level synteny between species pairs. These 
synteny maps allowed us to infer the history and locations of fixed 
genomic rearrangements between species. Although Drosophila spe- 
cies vary in their number of chromosomes, there are six fundamental 
chromosome arms common to all species. For ease of denoting 
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chromosomal homology, these six arms are referred to as ‘Muller 
elements’ after Hermann J. Muller, and are denoted A-F. Although 
most pairs of orthologous genes are found on the same Muller ele- 
ment, there is extensive gene shuffling within Muller elements 
between even moderately diverged genomes (Fig. 3, and Supplemen- 
tary Information section 9.1). 

Previous analysis has revealed heterogeneity in rearrangement 
rates among close relatives: careful inspection of 29 inversions that 
differentiate the chromosomes of D. melanogaster and D. yakuba 
revealed that 28 were fixed in the lineage leading to D. yakuba, and 
only one was fixed on the lineage leading to D. melanogaster”. 
Rearrangement rates are also heterogeneous across the genome 
among the 12 species: simulations reject a random-breakage model, 
which assumes that all sites are free to break in inversion events, but 
fail to reject a model of coldspots and hotspots for breakpoints 
(S. Schaeffer, personal communication). Furthermore, inversions 
seem to have played important roles in the process of speciation in 
at least some of these taxa”. 

One particularly striking example of the dynamic nature of gen- 
ome micro-structure in Drosophila is the homeotic homeobox (Hox) 
gene cluster(s)*". Hox genes typically occur in genomic clusters, and 
this clustering is conserved across many vertebrate and invertebrate 
taxa, suggesting a functional role for the precise and collinear 
arrangement of these genes. However, several cluster splits have been 
previously identified in Drosophila’, and the 12 Drosophila genome 
sequences provide additional evidence against the functional import- 
ance of Hox gene clustering in Drosophila. There are seven different 
gene arrangements found across 13 Drosophila species (the 12 
sequenced genomes and D. buzzatii), with no species retaining the 
inferred ancestral gene order**. It thus seems that, in Drosophila, Hox 
genes do not require clustering to maintain proper function, and are 
a powerful illustration of the dynamism of genome structure across 
the sequenced genomes. 

Transposable element evolution. Mobile, repetitive transposable 
element sequences are a particularly dynamic component of eukar- 
yotic genomes. Transposable element/repeat content (in scaffolds 
>200 kb) varies by over an order of magnitude across the genus, 
ranging from ~2.7% in D. simulans and D. grimshawi to ~25% in 
D. ananassae (Table 2, and Supplementary Fig. 1). These data 
support the lower euchromatic transposable element content in 
D. simulans relative to D. melanogaster*, and reveal that euchromatic 
transposable element/repeat content is generally similar within 
the melanogaster subgroup. Within the Drosophila subgenus, 
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D. grimshawi has the lowest transposable element/repeat content, 
possibly relating to its ecological status as an island endemic, which 
may minimize the chance for horizontal transfer of transposable 
element families. Finally, the highest levels of transposable element/ 
repeat content are found in D. ananassae and D. willistoni. These 
species also have the highest numbers of pseudo-transfer (t)RNA 
genes (Table 2), indicating a potential relationship between 
pseudo-tRNA genesis and repetitive DNA, as has been established 
in the mouse genome”. 

Different classes of transposable elements can vary in abundance 
owing to a variety of host factors, motivating an analysis of the 
intragenomic ecology of transposable elements in the 12 genomes. 
In D. melanogaster, long terminal repeat (LTR) retrotransposons 
have the highest abundance, followed by LINE (long interspersed 
nuclear element)-like retrotransposons and terminal inverted 
repeat (TIR) DNA-based transposons'*. An unbiased, conservative 
approach (Supplementary Information section 3) for estimating the 
rank order abundance of major transposable element classes suggests 
that these abundance trends are conserved across the entire genus 
(Supplementary Fig. 5). Two exceptions are an increased abundance 
of TIR elements in D. erecta and a decreased abundance of LTR 
elements in D. pseudoobscura; the latter observation may represent 
an assembly artefact because the sister species D. persimilis shows 
typical LTR abundance. Given that individual instances of transpos- 
able element repeats and transposable element families themselves 
are not conserved across the genus, the stability of abundance trends 
for different classes of transposable elements is striking and suggests 
common mechanisms for host-transposable element co-evolution in 
Drosophila. 

Although comprehensive analysis of the structural and evolution- 
ary relationships among families of transposable elements in the 12 
genomes remains a major challenge for Drosophila genomics, some 
initial insights can be gleaned from analysis of particularly well- 
characterized transposable element families. Previous analysis has 
shown variable dynamics for the most abundant transposable ele- 
ment family (DINE-1)*° in the D. melanogaster genome’**’: although 
inactive in D. melanogaster**, DINE-1 has experienced a recent trans- 
positional burst in D. yakuba*. Our analysis confirms that this ele- 
ment is highly abundant in all of the other sequenced genomes of 
Drosophila, but is not found outside of Diptera*’*'. Moreover, the 
inferred phylogenetic relationship of DINE-1 paralogues from 
several Drosophila species suggests vertical transmission as the major 
mechanism for DINE-1 propagation. Likewise, analysis of the Galileo 
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Figure 3 | Synteny plots for Muller elements B and C with respect to 

D. melanogaster gene order. The horizontal axis shows D. melanogaster 
gene order for Muller elements B and C, and the vertical axis maps 
homologous locations*”’** in individual species (a-f in increasing 


evolutionary distance from D. melanogaster). Left to right on the x axis is 
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from telomere to centromere for Muller element B, followed by Muller 
element C from centromere to telomere. Red and green lines represent 
syntenic segments in the same or reverse orientation along the chromosome 
relative to D. melanogaster, respectively. Blue segments show gene 
transposition of genes from one element to the other. 
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and 1360 transposons reveals a widespread but discontinuous phylo- 
genetic distribution for both families, notably with both families 
absent in the geographically isolated Hawaiian species, D. grim- 
shawi. These results are consistent with an ancient origin of the 
Galileo and 1360 families in the genus and subsequent horizontal 
transfer and/or loss in some lineages. 

The use of these 12 genomes also facilitated the discovery of trans- 
posable element lineages not yet documented in Drosophila, specif- 
ically the P instability factor (PIF) superfamily of DNA transposons. 
Our analysis indicates that there are four distinct lineages of this 
transposon in Drosophila, and that this element has indeed colonized 
many of the sequenced genomes”. This superfamily is particularly 
intriguing given that PIF-transposase-like genes have been impli- 
cated in the origin of at least seven different genes during the 
Drosophila radiation*’, suggesting that not only do transposable ele- 
ments affect the evolution of genome structure, but that their 
domestication can play a part in the emergence of novel genes. 

D. melanogaster maintains its telomeres by occasional targeted 
transposition of three telomere-specific non-LTR retrotransposons 
(HeT-A, TART and TAHRE) to chromosome ends**” and not by the 
more common mechanism of telomerase-generated G-rich repeats”. 
Multiple telomeric retrotransposons have originated within the 
genus, where they now maintain telomeres, and recurrent loss of 
most of the ORF2 from telomeric retrotransposons (for example, 
TAHRE) has given rise to half-telomeric-retrotransposons (for 
example, HeT-A) during Drosophila evolution®’. The phylogenetic 
relationship among these telomeric elements is congruent with the 
species phylogeny, suggesting that they have been vertically transmit- 
ted from a common ancestor”. 
ncRNA gene family evolution. Using ncRNA gene annotations 
across the 12-species phylogeny, we inferred patterns of gene copy 
number evolution in several ncRNA families. Transfer RNA genes are 
the most abundant family of ncRNA genes in all 12 genomes, with 
297 tRNAs in D. melanogaster and 261-484 tRNA genes in the other 
species (Table 2). Each genome encodes a single selenocysteine tRNA, 
with the exception of D. willistoni, which seems to lack this gene 
(R. Guigo, personal communication). Elevated tRNA gene counts 
in D. ananassae and D. willistoni are explained almost entirely by 
pseudo-tRNA gene predictions. We infer from the lack of pseudo- 
tRNAs in most Drosophila species, and from similar numbers of 
tRNAs obtained from an analysis of the chicken genome 
(n= 280)°%, that the minimal metazoan tRNA set is encoded by 
~300 genes, in contrast to previous estimates of 497 in human and 
659 in Caenorhabditis elegans. Similar numbers of snoRNAs are 
predicted in the D. melanogaster subgroup (n= 242-255), in which 
sequence similarity is high enough for annotation by homology, with 
fewer snoRNAs (n= 194-216) annotated in more distant members 
of the Sophophora subgenus, and even fewer snoRNAs (n = 139-165) 
predicted in the Drosophila subgenus, in which annotation by homo- 
logy becomes much more difficult. 

Of 78 previously reported micro (mi)RNA genes, 71 (91%) are 
highly conserved across the entire genus, with the remaining seven 
genes (mir-2b-1, -289, -303, -310, -311, -312 and -313) restricted to 
the subgenus Sophophora (Supplementary Information section 7.2). 
All the species contain similar numbers of spliceosomal snRNA genes 
(Table 2), including at least one copy each of the four U12-dependent 
(minor) spliceosomal RNAs, despite evidence for birth and death of 
these genes and the absence of stable subtypes®’. The unusual, lin- 
eage-specific expansion in size of U11 snRNA, previously described 
in Drosophila®', is even more extreme in D. willistoni. We annotated 
99 copies of the 5S ribosomal (r)RNA gene in a cluster in D. mela- 
nogaster, and between 13 and 73 partial 5S rRNA genes in clusters in 
the other genomes. Finally, we identified members of several other 
classes of ncRNA genes, including the RNA components of the 
RNase P (1 per genome) and the signal recognition particle (SRP) 
RNA complexes (1-3 per genome), suggesting that these functional 
RNAs are involved in similar biological processes throughout the 
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genus. We were only able to locate the roX (RNA on X)°*™ genes 
involved in dosage compensation using nucleotide homology in the 
melanogaster subgroup, although analyses incorporating structural 
information have identified roX genes in other members of the 
genus®. 

We investigated the evolution of rRNA genes in the 12 sequenced 
genomes, using trace archives to locate sequence variants within the 
transcribed portions of these genes. This analysis revealed moderate 
levels of variation that are not distributed evenly across the rRNA 
genes, with fewest variants in conserved core coding regions, more 
variants in coding expansion regions, and higher still variant abun- 
dances in non-coding regions. The level and distribution of sequence 
variation in rRNA genes are suggestive of concerted evolution, in 
which recombination events uniformly distribute variants through- 
out the rDNA loci, and selection dictates the frequency to which 
variants can expand®. 

Protein-coding gene family evolution. For a general perspective on 
how the protein-coding composition of these 12 genomes has chan- 
ged, we examined gene family expansions and contractions in the 
11,434 gene families (including those of size one in each species) 
predicted to be present in the most recent common ancestor of the 
two subgenera. We applied a maximum likelihood model of gene 
gain and loss” to estimate rates of gene turnover. This analysis sug- 
gests that gene families expand or contract at a rate of 0.0012 gains 
and losses per gene per million years, or roughly one fixed gene gain/ 
loss across the genome every 60,000 yr**. Many gene families (4,692 
or 41.0%) changed in size in at least one species, and 342 families 
showed significantly elevated (P< 0.0001) rates of gene gain and loss 
compared to the genomic average, indicating that non-neutral pro- 
cesses may play a part in gene family evolution. Twenty-two families 
exhibit rapid copy number evolution along the branch leading to 
D. melanogaster (eighteen contractions and four expansions; Sup- 
plementary Table 10). The most common Gene Ontology (GO) 
terms among families with elevated rates of gain/loss include ‘defence 
response’, ‘protein binding’, ‘zinc ion binding’, ‘proteolysis’, and 
‘trypsin activity’. Interestingly, genes involved in ‘defence response’ 
and ‘proteolysis’ also show high rates of protein evolution (see 
below). We also found heterogeneity in overall rates of gene gain 
and loss across lineages, although much of this variation could result 
from interspecific differences in assembly quality®. 
Lineage-specific genes. The vast majority of D. melanogaster 
proteins that can be unambiguously assigned a homology pattern 
(Supplementary Information section 5) are inferred to be ancestrally 
present at the genus root (11,348/11,644, or 97.5%). Of the 296 non- 
ancestrally present genes, 252 are either Sophophora-specific, or have 
a complicated pattern of homology requiring more than one gain 
and/or loss on the phylogeny, and are not discussed further. The 
remaining 44 proteins include 14 present in the melanogaster group, 
23 present only in the melanogaster subgroup, 3 unique to the mel- 
anogaster species complex, and 4 found in D. melanogaster only. 
Because we restricted this analysis to unambiguous homologues of 
high-confidence protein-coding genes in D. melanogaster’, we are 
probably undercounting the number of genes that have arisen 
de novo in any particular lineage. However, ancestrally heterochro- 
matic genes that are currently euchromatic in D. melanogaster may 
spuriously seem to be lineage-specific. 

The 44 lineage-specific genes (Supplementary Table 11) differ 
from ancestrally present genes in several ways. They have a shorter 
median predicted protein length (lineage-specific median 177 amino 
acids, other median 421 amino acids, MWU, P=3.6 X10 '°), 
are more likely to be intronless (Fisher’s exact test (FET), P= 
6.2 X 10 °), and are more likely to be located in the intron of another 
gene on the opposite strand (FET, P = 3.5 X 10“). In addition, 18 of 
these 44 genes are testis- or accessory-gland-specific in D. melanoga- 
ster, a significantly greater fraction than is found in the ancestral set 
(FET, P= 1.25 X 10 “). This is consistent with previous observa- 


tions that novel genes are often testis-specific in Drosophila®~”* and 
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expression studies on seven of the species show that species-restricted 
genes are more likely to exhibit male-biased expression”. Further, 
these genes are significantly more tissue-specific in expression (as 
measured by 7; ref. 74) (MWU, P= 9.6 X 10° °), and this pattern is 
not solely driven by genes with testis-specific expression patterns. 


Protein-coding gene evolution 

Positive selection and selective constraints in Drosophila genomes. 
To study the molecular evolution of protein-coding genes, we esti- 
mated rates of synonymous and non-synonymous substitution in 
8,510 single-copy orthologues within the six melanogaster group 
species using PAML” (Supplementary Information section 11.1); 
synonymous site saturation prevents analysis of more divergent com- 
parisons. We investigate only single-copy orthologues because when 
paralogues are included, alignments become increasingly proble- 
matic. Rates of amino acid divergence for single-copy orthologues 
in all 12 species were also calculated; these results are largely consist- 
ent with the analysis of non-synonymous divergence in the melano- 
gaster group, and are not discussed further. 

To understand global patterns of divergence and constraint 
across functional classes of genes, we examined the distributions of 
@ (=dy/ds, the ratio of non-synonymous to synonymous diver- 
gence) across Gene Ontology categories (GO)”, excluding GO 
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Figure 4 | Patterns of constraint and positive selection among GO terms. 
Distribution of average w per gene and the negative log; of the probability of 
positive selection (Supplementary Information section 11.2) for genes 

annotated with: a, biological process GO terms; b, cellular component GO 
terms; and ¢c, molecular function GO terms. Only GO terms with 200 or more 


ARTICLES 


annotations based solely on electronic support (Supplementary 
Information section 11.2). Most functional categories of genes are 
strongly constrained, with median estimates of @ much less than one. 
In general, functionally similar genes are similarly constrained: 
31.8% of GO categories have significantly lower variance in than 
expected (q-value true-positive test’’). Only 11% of GO categories 
had statistically significantly elevated @ (relative to the median of all 
genes with GO annotations) at a 5% false-discovery rate (FDR), 
suggesting either positive selection or a reduction in selective con- 
straint. The GO categories with elevated @ include the biological 
process terms ‘defence response’, ‘proteolysis’, ‘DNA metabolic 
process’ and ‘response to biotic stimulus’; the molecular function 
terms ‘transcription factor activity’, ‘peptidase activity’, ‘receptor 
binding’, ‘odorant binding’, ‘DNA binding’, ‘receptor activity’ and 
“G-protein-coupled receptor activity’; and the cellular location term 
‘extracellular’ (Fig. 4, and Supplementary Table 12). Similar results 
are obtained when dy is compared across GO categories, suggesting 
that in most cases differences in ® among GO categories is driven by 
amino acid rather than synonymous site substitutions. The two 
exceptions are the molecular function terms ‘transcription factor 
activity’ and ‘DNA binding activity’, for which we observe signifi- 
cantly decelerated ds (FDR = 7.2 X 10-* for both; Supplementary 
Information section 11.2) and no significant differences in dy. 
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genes annotated are plotted. See Supplementary Table 12 for median values 
and significance. Note that most genes evolve under evolutionary constraint 
at most of their sites, leading to low values of w; even genes that experience 
positive selection do not typically have an average © across all codons that 
exceeds one. 
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To distinguish possible positive selection from relaxed constraint, 
we tested explicitly for genes that have a subset of codons with sig- 
natures of positive selection, using codon-based likelihood models of 
molecular evolution, implemented in PAML’*” (Supplementary 
Information section 11.1). Although this test is typically regarded 
as a conservative test for positive selection, it may be confounded 
by selection at synonymous sites. However, selection at synonymous 
sites (that is, codon bias, see below) is quite weak. Moreover, vari- 
ability in @ presented here tends to reflect variability in dy. We 
therefore believe that it is appropriate to treat synonymous sites as 
nearly neutral and sites with @ > 1 as consistent with positive selec- 
tion. Despite a number of functional categories with evidence for 
elevated @, ‘helicase activity’ is the only functional category signifi- 
cantly more likely to be positively selected (permutation test, 
P=2%X 10 *, FDR = 0.007; Supplementary Table 12); the biological 
significance of this finding merits further investigation. Furthermore, 
within each GO class, there is greater dispersion among genes in their 
probability of positive selection than in their estimate of @ (MWU 
one-tailed, P= 0.011; Supplementary Information section 11.1), 
suggesting that although functionally similar genes share patterns 
of constraint, they do not necessarily show similar patterns of 
positive selection (Fig. 4). 

Interestingly, protein-coding genes with no annotated 
(‘unknown’) function in the GO database seem to be less constrained 
(permutation test, P< 1X 10 *, FDR=0.006)® and to have on 
average lower P-values for the test of positive selection than genes 
with annotated functions (permutation test, P= 0.001, FDR= 
0.058). It is unlikely that this observation results entirely from an 
over-representation of mis-annotated or non-protein-coding genes 
in the ‘unknown’ functional class, because this finding is robust to the 
removal of all D. melanogaster genes predicted to be non-protein- 
coding in ref. 8. The bias in the way biological function is ascribed 
to genes (to laboratory-induced, easily scorable functions) leaves 
open the possibility that unannotated biological functions may have 
an important role in evolution. Indeed, genes with characterized 
mutant alleles in FlyBase evolve significantly more slowly than other 
genes (median with alleles = 0.0525 and Owithout alleles = 0.0701; MWU, 
P<1xX10 '°). 

Previous work has suggested that a substantial fraction of non- 
synonymous substitutions in Drosophila were fixed through positive 
selection®'*°. We estimate that 33.1% of single-copy orthologues in 
the melanogaster group have experienced positive selection on at least 
a subset of codons (q-value true-positive tests’’) (Supplementary 
Information section 11.1). This may be an underestimate, because 
we have only examined single-copy orthologues, owing to difficulties 
in producing accurate alignments of paralogues by automated meth- 
ods. On the basis of the 878 genes inferred to have experienced 
positive selection with high confidence (FDR < 10%), we estimated 
that an average of 2% of codons in positively selected genes have 
@ > 1. Thus, several lines of evidence, based on different methodo- 
logies, suggest that patterns of amino acid fixation in Drosophila 
genomes have been shaped extensively by positive selection. 

The presence of functional domains within a protein may lead to 
heterogeneity in patterns of constraint and adaptation along its 
length. Among genes inferred to be evolving by positive selection 
at a 10% FDR, 63.7% (q-value true-positive tests’”) show evidence 
for spatial clustering of positively selected codons (Supplementary 
Information section 11.2). Spatial heterogeneity in constraint is fur- 
ther supported by contrasting for codons inside versus outside 
defined InterPro domains (genes lacking InterPro domains are 
treated as ‘outside’ a defined InterPro domain). Codons within 
InterPro domains were significantly more conserved than codons 
outside InterPro domains (median @: 0.062 InterPro domains, 
0.084 outside InterPro domains; MWU, P< 2.2 x 10°; Supple- 
mentary Information section 11.2). Similarly, there were significantly 
more positively selected codons outside of InterPro domains than 
inside domains (FET P< 2.2 X 107 '®), suggesting that in addition to 
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being more constrained, codons in protein domains are less likely to 
be targets of positive selection (Supplementary Fig. 6). 

Factors affecting the rate of protein evolution in Drosophila. The 
sequenced genomes of the melanogaster group provide unpreced- 
ented statistical power to identify factors affecting rates of protein 
evolution. Previous analyses have suggested that although the 
level of gene expression consistently seems to be a major determinant 
of variation in rates of evolution among proteins***’, other factors 
probably play a significant, if perhaps minor, part***'. In Drosophila, 
although highly expressed genes do evolve more slowly, breadth of 
expression across tissues, gene essentiality and intron number all also 
independently correlate with rates of protein evolution, suggesting 
that the additional complexities of multicellular organisms are 
important factors in modulating rates of protein evolution’. The 
presence of repetitive amino acid sequences has a role as well: non- 
repeat regions in proteins containing repeats evolve faster and show 
more evidence for positive selection than genes lacking repeats”. 

These data also provide a unique opportunity to examine the 
impact of chromosomal location on evolutionary rates. Population 
genetic theory predicts that for new recessive mutations, both 
purifying and positive selection will be more efficient on the 
X chromosome given its hemizygosity in males”’. In contrast, the lack 
of recombination on the small, mainly heterochromatic dot chro- 
mosome”*”> is expected to reduce the efficacy of selection”. Because 
codon bias, or the unequal usage of synonymous codons in protein- 
coding sequences, reflects weak but pervasive selection, it is a sen- 
sitive metric for evaluating the efficacy of purifying selection. 
Consistent with expectation, in all 12 species, we find significantly 
elevated levels of codon bias on the X chromosome and significantly 
reduced levels of codon bias on the dot chromosome”’. Furthermore, 
X-chromosome-linked genes are marginally over-represented within 
the set of positively selected genes in the melanogaster group (FET, 
P= 0.055), which is consistent with increased rates of adaptive sub- 
stitution on this chromosome. This analysis suggests that chromo- 
somal context also serves to modulate rates of molecular evolution in 
protein-coding genes. 

To examine further the impact of genomic location on protein 
evolution, we examined the subset of genes that have moved within 
or between chromosome arms*””*. Genes inferred to have moved 
between Muller elements have a significantly higher rate of protein 
evolution than genes inferred to have moved within a Muller element 
(MWU, P=1.32 X 10 '*) and genes that have maintained 
their genomic position (MWU, P= 0.008) (Supplementary Fig. 7). 
Interestingly, genes that move within Muller elements have a signifi- 
cantly lower rate of protein evolution than those for which genomic 
locations have been maintained (MWU, P=3.85 X 10 1). It 
remains unclear whether these differences reflect underlying biases 
in the types of genes that move inter- versus intra-chromosomally, or 
whether they are due to in situ patterns of evolution in novel genomic 
contexts. 

Codon bias. Codon bias is thought to enhance the efficiency and/or 
accuracy of translation®””"'*' and seems to be maintained by muta- 
tion—selection—drift balance’”’"'™. Across the 12 Drosophila genomes, 
there is more codon bias in the Sophophora subgenus than in the 
Drosophila subgenus, and a previously noted’ striking reduction 
in codon bias in D. willistoni''®'"' (Fig. 5). However, with only minor 
exceptions, codon preferences for each amino acid seem to be con- 
served across 11 of the 12 species. The striking exception is D. will- 
istoni, in which codon usage for 6 of 18 redundant amino acids has 
diverged (Fig. 5). Mutation alone is not sufficient to explain codon- 
usage bias in D. willistoni, which is suggestive of a lineage-specific 
shift in codon preferences'''’'*. We found evidence for a lineage- 
specific genomic reduction in codon bias in D. melanogaster 
(Fig. 5), as has been suggested previously'!*'”’. In addition, max- 
imum-likelihood estimation of the strength of selection on synonym- 
ous sites in 8,510 melanogaster group single-copy orthologues 
revealed a marked reduction in the number of genes under selection 
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for increased codon bias in D. melanogaster relative to its sister spe- 
cies D. sechellia’”®. 

Evolution of genes associated with ecology and reproduction. 
Given the ecological and environmental diversity encompassed by 
the 12 Drosophila species, we examined the evolution of genes and 
gene families associated with ecology and reproduction. Specifically, 
we selected genes with roles in chemoreception, detoxification/ 
metabolism, immunity/defence, and sex/reproduction for more 
detailed study. 

Chemoreception. Drosophila species have complex olfactory and 
gustatory systems used to identify food sources, hazards and mates, 
which depend on odorant-binding proteins, and olfactory/odorant 
and gustatory receptors (Ors and Grs). The D. melanogaster genome 
has approximately 60 Ors, 60 Grs and 50 odorant-binding protein 
genes. Despite overall conservation of gene number across the 12 
species and widespread evidence for purifying selection within the 
melanogaster group, there is evidence that a subset of Orand Gr genes 
experiences positive selection’*’’”. Furthermore, clear lineage- 
specific differences are detectable between generalist and specialist 
species within the melanogaster subgroup. First, the two indepen- 
dently evolved specialists (D. sechellia and D. erecta) are losing Gr 
genes approximately five times more rapidly than the generalist spe- 
cies’*'*, We believe this result is robust to sequence quality, because 
all pseudogenes and deletions were verified by direct re-sequencing 
and synteny-based orthologue searches, respectively. Generalists are 
expected to encounter the most diverse set of tastants and seem to 
have maintained the greatest diversity of gustatory receptors. Second, 
Or and Gr genes that remain intact in D. sechellia and D. erecta evolve 
significantly more rapidly along these two lineages (@ = 0.1556 for 
Ors and 0.1874 for Grs) than along the generalist lineages 
(@ =0.1049 for Ors and 0.1658 for Grs; paired Wilcoxon, 
P=0.0003 and 0.003, respectively’). There is some evidence that 
odorant-binding protein genes also evolve significantly faster in spe- 
cialists compared to generalists'”*. This elevated @ reflects a trend 
observed throughout the genomes of the two specialists and is likely 
to result, at least in part, from demographic phenomena. However, 
the difference between specialist and generalist @ for Or/Gr genes 
(0.0292) is significantly greater than the difference for genes across 
the genome (0.0091; MWU, P= 0.0052)’, suggesting a change in 
selective regime. Moreover, the observation that elevated @ as well as 
accelerated gene loss disproportionately affect groups of Or and Gr 
genes that respond to specific chemical ligands and/or are expressed 
during specific life stages suggests that rapid evolution at Or/Gr loci 
in specialists is related to the ecological shifts these species have 
sustained”. 
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Figure 5 | Deviations in codon bias from D. melanogaster in 11 Drosophila 
species. The upper panel depicts differences in ENC (effective number of 
codons) between D. melanogaster and the 11 non-melanogaster species, 
calculated on a gene-by-gene basis. Note that increasing levels of ENC 
indicates a decrease in codon bias. The Sophophora subgenus in general has 
higher levels of codon bias than the Drosophila subgenus with the exception 
of D. willistoni, which shows a dramatic reduction in codon bias. The lower 
panel shows the 7 codons for which preference changes across the 12 
Drosophila species. A dot indicates identical codon preference to D. 
melanogaster; otherwise the preferred codon is indicated. 
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Detoxification/metabolism. The larval food sources for many 
Drosophila species contain a cocktail of toxic compounds, and con- 
sequently Drosophila genomes encode a wide variety of detoxification 
proteins. These include members of the cytochrome P450 (P450), 
carboxyl/choline-esterase (CCE) and glutathione S-transferase 
(GST) multigene families, all of which also have critical roles in 
resistance to insecticides’**"'””. Among the P450s, the five enzymes 
associated with insecticide resistance are highly dynamic across the 
phylogeny, with 24 duplication events and 4 loss events since the last 
common ancestor of the genus, which is in striking contrast to genes 
with known developmental roles, eight of which are present as a 
single copy in all 12 species (C. Robin, personal communication). 
As with chemoreceptors, specialists seem to lose detoxification genes 
at a faster rate than generalists. For instance, D. sechellia has lost the 
most P450 genes; these 14 losses comprise almost one-third of all 
P450 loss events (Supplementary Table 13) (C. Robin, personal 
communication). Positive selection has been implicated in detoxi- 
fication-gene evolution as well, because a search for positive 
selection among GSTs identified the parallel evolution of a radical 
glycine to lysine amino acid change in GSTD1, an enzyme known to 
degrade DDT™. Finally, although metabolic enzymes in general are 
highly constrained (median w = 0.045 for enzymes, 0.066 for non- 
enzymes; MWU, P=5.7 X 10), enzymes involved in xenobiotic 
metabolism evolve significantly faster than other enzymes (median 
@=0.05 for the xenobiotic group versus = 0.045 overall, 
two-tailed permutation test, P= 0.0110; A. J. Greenberg, personal 
communication). 

Metazoans deal with excess selenium in the diet by sequestration in 
selenoproteins, which incorporate the rare amino acid selenocysteine 
(Sec) at sites specified by the TGA codon. The recoding of the norm- 
ally terminating signal TGA as a Sec codon is mediated by the sele- 
nocystein insertion sequence (SECIS), a secondary structure in the 
3’ UTR of selenoprotein messenger RNAs. All animals examined so 
far have selenoproteins; three have been identified in D. melanogaster 
(SELG, SELM and SPS2’”**°). Interestingly, although the three 
known melanogaster selenoproteins are all present in the genomes 
of the other Drosophila species, in D. willistoni the TGA Sec codons 
have been substituted by cysteine codons (TGT/TGC). Consistent 
with this finding, analysis of the seven genes implicated to date in 
selenoprotein synthesis including the Sec-specific tRNA suggests that 
most of these genes are absent in D. willistoni (R. Guigo, personal 
communication). D. willistoni thus seems to be the first animal 
known to lack selenoproteins. If correct, this observation is all the 
more remarkable given the ubiquity of selenoproteins and the seleno- 
protein biosynthesis machinery in metazoans, the toxicity of excess 
selenium, and the protection from oxidative stress mediated by 
selenoproteins. However, it remains possible that this species 
encodes selenoproteins in a different way, and this represents an 
exciting avenue of future research. 

Immunity/defence. Drosophila, like all insects, possesses an innate 
immune system with many components analogous to the innate 
immune pathways of mammals, although it lacks an antibody- 
mediated adaptive immune system'*'. Immune system genes often 
evolve rapidly and adaptively, driven by selection pressures from 
pathogens and parasites’**"'**. The genus Drosophila is no exception: 
immune system genes evolve more rapidly than non-immune genes, 
showing both high total divergence rates and specific signs of positive 
selection’. In particular, 29% of receptor genes involved in phago- 
cytosis seem to evolve under positive selection, suggesting that 
molecular co-evolution between Drosophila pattern recognition 
receptors and pathogen antigens is driving adaptation in the immune 
system'*°. Somewhat surprisingly, genes encoding effector proteins 
such as antimicrobial peptides are far less likely to exhibit adaptive 
sequence evolution. Only 5% of effector genes (and no antimicrobial 
peptides) show evidence of adaptive evolution, compared to 10% of 
genes genome-wide. Instead, effector genes seem to evolve by rapid 
duplication and deletion. Whereas 49% of genes genome-wide, 63% 


211 


©2007 Nature Publishing Group 


ARTICLES 


of genes involved in pathogen recognition and 81% of genes impli- 
cated in immune-related signal transduction can be found as single- 
copy orthologues in all 12 species, only 40% of effector genes exist as 
single-copy orthologues across the genus (y¥2 = 41.13, P= 2.53 X 
10 *), suggesting rapid radiation of effector protein classes along 
particular lineages'*°. Thus, much of the Drosophila immune system 
seems to evolve rapidly, although the mode of evolution varies across 
immune-gene functional classes. 

Sex/reproduction. Genes encoding sex- and reproduction-related 
proteins are subject to a wide array of selective forces, including 
sexual conflict, sperm competition and cryptic female choice, and 
to the extent that these selective forces are of evolutionary con- 
sequence, this should lead to rapid evolution in these genes’”® (for 
an overview see refs 137, 138). The analysis of 2,505 sex- and 
reproduction-related genes within the melanogaster group indicated 
that male sex- and reproduction-related genes evolve more rapidly at 
the protein level than genes not involved in sex or reproduction or 
than female sex- and reproduction-related genes (Supplementary 
Fig. 8). Positive selection seems to be at least partially responsible 
for these patterns, because genes involved in spermatogenesis have 
significantly stronger evidence for positive selection than do non- 
spermatogenesis genes (permutation test, P= 0.0053). Similarly, 
genes that encode components of seminal fluid have significantly 
stronger evidence for positive selection than ‘non-sex’ genes'”’. 
Moreover, protein-coding genes involved in male reproduction, 
especially seminal fluid and testis genes, are particularly likely to be 
lost or gained across Drosophila species’. 

Evolutionary forces in the mitochondrial genome. Functional ele- 
ments in mtDNA are strongly conserved, as expected: tRNAs are 
relatively more conserved than the mtDNA overall (average pairwise 
nucleotide distance = 0.055 substitutions per site for tRNAs versus 
0.125 substitutions per site overall). We observe a deficit of substitu- 
tions occurring in the stem regions of the stem-loop structure in 
tRNAs, consistent with strong selective pressure to maintain RNA 
secondary structure, and there is a strong signature of purifying 
selection in protein-coding genes'*. However, despite their shared 
role in aerobic respiration, there is marked heterogeneity in the rates 
of amino acid divergence between the oxidative phosphorylation 
enzyme complexes across the 12 species (NADH dehydrogenase, 
0.059 > ATPase, 0.042 > CytB, 0.037 > cytochrome oxidase, 0.020; 
mean pairwise dy), which contrasts with the relative homogeneity in 
synonymous substitution rates. A model with distinct substitution 
rates for each enzyme complex rather than a single rate provides a 
significantly better fit to the data (P< 0.0001), suggesting complex- 
specific selective effects of mitochondrial mutations". 


Non-coding sequence evolution 

ncRNA sequence evolution. The availability of complete sequence 
from 12 Drosophila genomes, combined with the tractability of RNA 
structure predictions, offers the exciting opportunity to connect pat- 
terns of sequence evolution directly with structural and functional 
constraints at the molecular level. We tested models of RNA evolu- 
tion focusing on specific ncRNA gene classes in addition to inferring 
patterns of sequence evolution using more general datasets that are 
based on predicted intronic RNA structures. 

The exquisite simplicity of miRNAs and their shared stem-loop 
structure makes these ncRNAs particularly amenable to evolutionary 
analysis. Most miRNAs are highly conserved within the Drosophila 
genus: for the 71 previously described miRNA genes inferred to 
be present in the common ancestor of these 12 species, mature 
miRNA sequences are nearly invariant. However, we do find a small 
number of substitutions and a single deletion in mature miRNA 
sequences (Supplementary Table 14), which may have functional 
consequences for miRNA-target interactions and may ultimately 
help identify targets through sequence covariation. Pre-miRNA 
sequences are also highly conserved, evolving at about 10% of the 
rate of synonymous sites. 
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To link patterns of evolution with structural constraints, we 
inferred ancestral pre-miRNA sequences and deduced secondary 
structures at each ancestral node on the phylogeny (Supplementary 
Information section 12.1). Although conserved miRNA genes show 
little structural change (little change in free energy), the five melano- 
gaster group-specific miRNA genes (miR-303 and the mir-310/311/ 
312/313 cluster) have undergone numerous changes across the entire 
pre-miRNA sequence, including the ordinarily invariant mature 
miRNA. Patterns of polymorphism and divergence in these lin- 
eage-specific miRNA genes, including a high frequency of derived 
mutations, are suggestive of positive selection'*®. Although lineage- 
specific miRNAs may evolve under less constraint because they have 
fewer target transcripts in the genome, it is also possible that recent 
integration into regulatory networks causes accelerated rates of 
miRNA evolution. 

We further investigated patterns of sequence evolution for the 
subset of 38 conserved pre-miRNAs with mature miRNA sequences 
at their 3’ end by calculating evolutionary rates in distinct site classes 
(Fig. 6, and Supplementary Information section 12.2). Outside the 
mature miRNA and its complementary sequence, loops had the high- 
est rate of evolution, followed by unpaired sites, with paired sites 
having the lowest rate of evolution. Inside the mature miRNA, 
unpaired sites evolve more slowly than paired sites, whereas the 
opposite is true for the sequence complementary to the mature 
miRNA. Surprisingly, a large fraction of unpaired bulges or internal 
loops in the mature miRNA seem to be conserved—a pattern which 
may have implications for models of miRNA biogenesis and the 
degree of mismatch allowed in miRNA-target prediction methods. 
Overall these results support the qualitative model proposed in ref. 
141 for the canonical progression of miRNA evolution, and show that 
functional constraints on the miRNA itself supersede structural con- 
straints imposed by maintenance of the hairpin-loop. 

To assess constraint on stem regions of RNA structures more 
generally, we compared substitution rates in stems (S) to those in 
nominally unconstrained loop regions (L) in a wide variety of 
ncRNAs (Supplementary Information section 12.3). We estimated 
substitution rates using a maximum likelihood framework, and com- 
pared the observed L/S ratio with the average L/S ratio estimated 
from published secondary structures in RFAM, which we normalized 
to 1.0. L/S ratios for Drosophila ncRNA families range from a highly 
constrained 2.57 for the nuclear RNase P family to 0.56 for the 5S 
ribosomal RNA (Supplementary Table 15). 


Substitution rate relative to class 1 


Site class 


Figure 6 | Substitution rate of site classes within miRNAs. Bootstrap 
distributions of miRNA substitution rates. Structural alignments of miRNA 
precursor hairpins were partitioned into six site-classes (inset): (1) hairpin 
loops; unpaired sites (2) outside, (3) in the complementary region of, and (4) 
inside the miRNA; and base pairs (5) adjacent to and (6) involving the 
miRNA. Whiskers show approximate 95% confidence intervals for median 
differences, boxes show interquartile range. 
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Finally, we predicted a set of conserved intronic RNA structures 

and analysed patterns of compensatory nucleotide substitution in 
D. melanogaster, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis 
and D. mojavensis (Supplementary Information section 13). Signa- 
tures of compensatory evolution in RNA helices are detected as 
covarying nucleotide sites or ‘covariations’ (that is, two Watson— 
Crick bases that interact in species A replaced by a different 
Watson-—Crick pair in species B). The number of covariations (per 
base pair of a helix) depends on the physical distance between the 
interacting nucleotides (Supplementary Fig. 9), as has been observed 
for the RNA helices in the Drosophila bicoid 3' UTR region. Short- 
range pairings exhibit a higher average number of covariations with a 
larger variance among helices than longer-range pairings. The 
decrease in rate of covariation with increasing distance may be 
explained by physical properties of a helix, which may impose selec- 
tive constraints on the evolution of covarying nucleotides within a 
helix. Alternatively, if individual mutations at each locus are dele- 
terious but compensated by mutations at a second locus, given suffi- 
ciently strong selection against the first deleterious mutation these 
epistatic fitness interactions could generate the observed distance 
effect’. 
Evolution of cis-regulatory DNAs. Comparative analyses of cis- 
regulatory sequences may provide insights into the evolutionary 
forces acting on regulatory components of genes, shed light on the 
constraints of the cis-regulatory code and aid in annotation of 
new regulatory sequences. Here we rely on two recently compiled 
databases, and present results comparing cis-regulatory modules'* 
and transcription factor binding sites (derived from DNaseI foot- 
prints)'* between D. melanogaster and D. simulans (Supplementary 
Information section 8). We estimated mean selective constraint (C, 
the fraction of mutations removed by natural selection) relative to 
the ‘fastest evolving intron’ sites at the 5’ end of short introns, which 
represent putatively unconstrained neutral standards (Supplemen- 
tary Information section 8.2)'*°. Note that this approach ignores the 
contribution of positively selected sites, potentially underestimating 
the fraction of functionally relevant sites’’’. 

Consistent with previous findings, Drosophila cis-regulatory 
sequences are highly constrained'**'*°. Mean constraint within cis- 
regulatory modules is 0.643 (95% bootstrap confidence inter- 
val = 0.621-0.662) and within footprints is 0.692 (0.655-0.723), 
both of which are significantly higher than mean constraint in 
non-coding DNA overall (0.555 (0.546-0.563)) and significantly 
lower than constraint at non-degenerate coding sites (0.862 
(0.856-0.868)) and ncRNA genes (0.864 (0.846—0.880)) (Supple- 
mentary Fig. 10). The high level of constraint in cis-regulatory 
sequences also extends into flanking sequences, only declining to 
constraint levels typical of non-coding DNA 40bp away. This is 
consistent with previous findings that transcription factor binding 
sites tend to be found in larger blocks of constraint that cluster to 
form cis-regulatory modules’. To understand selective constraints 
on nucleotides within cis-regulatory sequences that have direct con- 
tact with transcription factors, we estimated the selective constraint 
for the best match to position weight matrices within each foot- 
print’*'; core motifs in transcription-factor-binding sites have a 
mean constraint of 0.773 (0.729-0.814), significantly greater than 
the mean for the footprints as a whole, and approaching the level 
of constraint found at non-degenerate coding sites and in ncRNA 
genes (Supplementary Fig. 10). 

We next examined the variation in selective constraint across cis- 
regulatory sequences. Surprisingly, we find no evidence that selective 
constraint is correlated with predicted transcription-factor-binding 
strength (estimated as the position weight matrix score P-value) 
(Spearman’s r= 0.0681, P= 0.0609). We observe significant vari- 
ation in constraint both among target genes (Kruskal-Wallis tests, 
footprints, P< 0.0001; and position weight matrix matches within 
footprints, P= 0.0023) and among chromosomes (cis-regulatory 
modules, P= 0.0186; footprints, P = 0.0388; and position weight 
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matrix matches within footprints, P=0.0108; Supplementary 
Table 16). 


Discussion and conclusion 


Each new genome sequence affords novel opportunities for compar- 
ative genomic inference. What makes the analysis of these 12 
Drosophila genomes special is the ability to place every one of these 
genomic comparisons on a phylogeny with a taxon separation that is 
ideal for asking a wealth of questions about evolutionary patterns and 
processes. It is without question that this phylogenomic approach 
places additional burdens on bioinformatics efforts, multiplying the 
amount of data many-fold, requiring extra care in generating multi- 
species alignments, and accommodating the reality that not all gen- 
ome sequences have the same degree of sequencing or assembly 
accuracy. These difficulties notwithstanding, phylogenomics has 
extraordinary advantages not only for the analyses that are possible, 
but also for the ability to produce high-quality assemblies and accur- 
ate annotations of functional features in a genome by using closely 
related genomes as guides. The use of multi-species orthology pro- 
vides especially convincing evidence in support of particular gene 
models, not only for protein-coding genes, but also for miRNA 
and other ncRNA genes. 

Many attributes of the genomes of Drosophila are remarkably con- 
served across species. Overall genome size, number of genes, distri- 
bution of transposable element classes, and patterns of codon usage 
are all very similar across these 12 genomes, although D. willistoni is 
an exceptional outlier by several criteria, including its unusually 
skewed codon usage, increased transposable element content and 
potential lack of selenoproteins. At a finer scale, the number of struc- 
tural changes and rearrangements is much larger; for example, there 
are several different rearrangements of genes in the Hox cluster found 
in these Drosophila species. 

The vast majority of multigene families are found in all 12 gen- 
omes, although gene family size seems to be highly dynamic: almost 
half of all gene families change in size on at least one lineage, and a 
noticeable fraction shows rapid and lineage-specific expansions and 
contractions. Particularly notable are cases consistent with adaptive 
hypotheses, such as the loss of Gr genes in ecological specialists and 
the lineage-specific expansions of antimicrobial peptides and other 
immune effectors. All species were found to have novel genes not seen 
in other species. Although lineage-specific genes are challenging to 
verify computationally, we can confirm at least 44 protein-coding 
genes unique to the melanogaster group, and these proteins have very 
different properties from ancestral proteins. Similarly, although the 
relative abundance of transposable element subclasses across these 
genomes does not differ dramatically, total genomic transposable 
element content varies substantially among species, and several 
instances of lineage-specific transposable elements were discovered. 

There is considerable variation among protein-coding genes in 
rates of evolution and patterns of positive selection. Functionally 
similar proteins tend to evolve at similar rates, although variation 
in genomic features such as gene expression level, as well as chromo- 
somal location, are also associated with variation in evolutionary rate 
among proteins. Whereas broad functional classes do not seem to 
share patterns of positive selection, and although very few GO cat- 
egories show excesses of positive selection, a number of genes 
involved in interactions with the environment and in sex and repro- 
duction do show signatures of adaptive evolution. It thus seems likely 
that adaptation to changing environments, as well as sexual selection, 
shape the evolution of protein-coding genes. 

Annotation of ncRNA genes across all 12 species allows com- 
prehensive analysis of the evolutionary divergence of these genes. 
MicroRNA genes in particular are more conserved than protein- 
coding genes with respect to their primary DNA sequence, and the 
substitutions that do occur often have compensatory changes such 
that the average estimated free energy of the folding structures 
remains remarkably constant across the phylogeny. Surprisingly, 
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mismatches in miRNAs seem to be highly conserved, which may 
impact models of miRNA biogenesis and target recognition. 
Lineage-restricted miRNAs, however, have considerably elevated 
rates of change, suggesting either reduced constraint due to novel 
miRNAs having fewer targets, or adaptive evolution of evolutionarily 
young miRNAs. 

Virtually any question about the function of genome features in 
Drosophila is now empowered by being embedded in the context of 
this 12 species phylogeny, allowing an analysis of the ways by which 
evolution has tuned myriad biological processes across the hundreds 
of millions of years spanned in total by this phylogeny. The analyses 
presented herein have generated more questions than they have 
answered, and these results represent a small fraction of that which 
is possible. Because much of this rich and extraordinary comparative 
genomic dataset remains to be explored, we believe that these 12 
Drosophila genome sequences will serve as a powerful tool for glean- 
ing further insight into genetic, developmental, regulatory and evolu- 
tionary processes. 


METHODS 


The full methods for this paper are described in Supplementary Information. 
Here, we describe the datasets generated by this project and their availability. 
Genomic sequence. Scaffolds and assemblies for all genomic sequence generated 
by this project are available from GenBank (Supplementary Tables 4 and 5), and 
FlyBase (ftp://ftp.flybase.net/12_species_analysis/). Genome browsers are 
available from UCSC (http://genome.ucsc.edu/cgi-bin/hgGateway?hgsid = 
98180333&clade = insect&org = 0&db =0) and Flybase (http://flybase.org/ 
cgi-bin/gbrowse/dmel/). BLAST search of these genomes is available at FlyBase 
(http://flybase.org/blast). 

Predicted gene models. Consensus gene predictions for the 11 non-melanoga- 
ster species, produced by combining several different GLEAN runs that weight 
homology evidence more or less strongly, are available from FlyBase as GFF files 
for each species (ftp://ftp.flybase.net/12_species_analysis/). These gene models 
can also be accessed from the Genome Browser in FlyBase (Gbrowse; http:// 
flybase.org/cgi-bin/gbrowse/dmel/). Predictions of non-protein-coding genes 
are also available in GFF format for each species, from FlyBase (ftp://ftp. 
flybase.net/12_species_analysis/). 

Homology. Multiway homology assignments are available from FlyBase (ftp:// 
ftp.flybase.net/12_species_analysis/), and also in the Genome Browser 
(Gbrowse). 

Alignments. All alignment sets produced are available in FASTA format from 
FlyBase (ftp://ftp.flybase.net/12_species_analysis/). 

PAML parameters. Output from PAML models for the alignments of single 
copy orthologues in the melanogaster group, including the q-value for the 
test for positive selection, are available from FlyBase (ftp://ftp.flybase.net/ 
12_species_analysis/). 
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Discovery of functional elements in 12 
Drosophila genomes using evolutionary 
signatures 


Alexander Stark'*, Michael F. Lin’?*, Pouya Kheradpour’*, Jakob S. Pedersen***, Leopold Parts”, 

Joseph W. Carlson’, Madeline A. Crosby’, Matthew D. Rasmussen, Sushmita Roy’, Ameya N. Deoras’, 

J. Graham Ruby’”"', Julius Brennecke'’, Harvard FlyBase curators}, Berkeley Drosophila Genome Projectt, 
Emily Hodges'”, Angie S. Hinrichs*, Anat Caspi’’, Benedict Paten*”’*, Seung-Won Park!°, Mira V. Han’®, 
Morgan L. Maeder’’, Benjamin J. Polansky’’, Bryanne E. Robson’’, Stein Aerts'®'®, Jacques van Helden”’, 
Bassem Hassan’®’”, Donald G. Gilbert*', Deborah A. Eastman’’, Michael Rice”, Michael Weir’, 

Matthew W. Hahn’'°, Yongkyu Park’’, Colin N. Dewey”, Lior Pachter’”*®, W. James Kent*, David Haussler’, 
Eric C. Lai?’, David P. Bartel'®'’, Gregory J. Hannon’*, Thomas C. Kaufman’, Michael B. Eisen*®”?, 

Andrew G. Clark*’, Douglas Smith®’, Susan E. Celniker’, William M. Gelbart®** & Manolis Kellis’ 


Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the 
systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of 
functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary 
signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and 
exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon 
readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We 
provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several 
classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We 
also study how discovery power scales with the divergence and number of species compared, and we provide general 
guidelines for comparative studies. 


15-20 


The sequencing of the human genome and the genomes of dozens of 
other metazoan species has intensified the need for systematic meth- 
ods to extract biological information directly from DNA sequence. 
Comparative genomics has emerged as a powerful methodology for 
this endeavour'”. Comparison of few (two—four) closely related gen- 
omes has proven successful for the discovery of protein-coding 
genes* >, RNA genes®’, miRNA genes*"' and catalogues of regulatory 
elements**!*"'*. The resolution and discovery power of these studies 


should increase with the number of genomes!*”®, in principle enab- 
ling the systematic discovery of all conserved functional elements. 
The fruitfly Drosophila melanogaster is an ideal system for deve- 
loping and evaluating comparative genomics methodologies. Over 
the past century, Drosophila has been a pioneering model in which 
many of the basic principles governing animal development and 
population biology were established’. In the past decade, the genome 
sequence of D. melanogaster provided one of the first systematic views 
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of a metazoan genome”, and the ongoing effort by the FlyBase and 
Berkeley Drosophila Genome Project (BDGP) groups established a 
systematic high-quality genome annotation***°. Moreover, the fruit- 
fly benefits from extensive experimental resources****, which enable 
novel functional elements to be systematically tested and used in the 
evaluation of genetic screens”””. 

The fly research community has sequenced, assembled and anno- 
tated the genomes of 12 Drosophila species***'** at a range of evolu- 
tionary distances from D. melanogaster (Fig. 1a, b). The analysis of 
these genomes was organized around two complementary aims. The 
first, described in an accompanying paper”, was to understand the 
evolution of genes and chromosomes on the Drosophila phylogeny, 
and how it relates to speciation and adaptation. The second goal, 
described here, was to develop general comparative methodologies to 
discover and refine functional elements in D. melanogaster using the 
12 genomes, and to investigate the scaling of discovery power and its 
implications for studies in vertebrates (Fig. 1c). 

Here, we report genome-wide alignments of the 12 species 
(Supplementary Information 1), and the systematic discovery of 
euchromatic functional elements in the D. melanogaster genome. 
We predict and refine thousands of protein-coding exons, RNA 
genes and structures, miRNAs, pre- and post-transcriptional regu- 
latory motifs and regulatory targets. We validate many of these ele- 
ments using complementary DNA (cDNA) sequencing, human 
curation, small RNA sequencing, and correlation with experimen- 
tally supported transcription factor and miRNA targets. In addition, 
our analysis leads to several specific biological findings, listed below. 
@ We predict 123 novel polycistronic transcripts, 149 genes with 
apparent stop-codon readthrough and several candidate programmed 
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frameshifts, with potential roles in regulation, localization and func- 
tion of the corresponding protein products. 

@ We make available the first systematic prediction of general RNA 
genes and structures (non-coding RNAs (ncRNAs)) in Drosophila, 
including several structures probably involved in translational regu- 
lation and adenosine-to-inosine RNA editing (A-to-I editing). 

@ We present comparative and experimental evidence that some 
miRNA loci yield multiple functional products, from both hairpin 
arms or from both DNA strands, thereby increasing the versatility 
and complexity of miRNA-mediated regulation. 

@ We provide further comparative evidence for miRNA targeting in 
protein-coding exons. 

@ We report an initial network of pre- and post-transcriptional 
regulatory targets in Drosophila on the basis of individual high- 
confidence motif occurrences. 

Comparative genomics and evolutionary signatures. Although 
multiple closely related genomes provide sufficient neutral diver- 
gence for recognition of functional regions in stretches of highly 
conserved nucleotides'*'”**, measures of nucleotide conservation 
alone do not distinguish between different types of functional ele- 
ments. Moreover, functional elements that tolerate abundant ‘silent’ 
mutations, such as protein-coding exons and many regulatory 
motifs, might not be detected when searching on the basis of strong 
nucleotide conservation. 

Across many genomes spanning larger evolutionary distances, the 
information in the patterns of sequence change reveals evolutionary 
signatures (Fig. 2) that can be used for systematic genome annota- 
tion. Protein-coding regions show highly constrained codon substi- 
tution frequencies* and insertions and deletions that are heavily 
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Figure 1| Phylogeny and alignment of 12 Drosophila species. 

a, Phylogenetic tree relating the 12 Drosophila species, estimated from 
fourfold degenerate sites (Supplementary Methods 1). The 12 species span a 
total branch length of 4.13 substitutions per neutral site. b, Gene order 
conservation for a 0.45-Mb region of chromosome 2L centred on CG4495, 
for which we predict a new exon (Fig. 3a), and spanning 35 genes. Colour 
represents the direction of transcription. Boxes represent full gene models. 
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Individual exons and introns are not shown. ¢, Comparison of evolutionary 
distances spanned by fly and vertebrate trees. Pairwise and multi-species 
distances (in substitutions per fourfold degenerate site) are shown from D. 
melanogaster and from human as reference genomes. Note that species with 
longer branches (for example, mouse) show higher pairwise distances, not 
always reflecting the order of divergence. Multi-species distances include all 
species within a phylogenetic clade. 
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biased to be multiples of three* (Fig. 2a). RNA genes and structures 
tolerate substitutions that preserve base pairing***® (Fig. 2b). 
MicroRNA hairpins show a characteristic conservation profile with 
high conservation in the stem and mutations in loop regions'®" 
(Fig. 2c). Finally, regulatory motifs are marked by high levels of 
genome-wide conservation**'**, and post-transcriptional motifs 
show strand-biased conservation’ (Fig. 2d, e). 

We find that these signatures can be much more precise for gen- 
ome annotation than the overall level of nucleotide conservation (for 
example, Fig. 3a). 


Revisiting the protein-coding gene catalogue 


The annotation of protein-coding genes remains difficult in meta- 
zoan genomes owing to short exons and complex gene structures 
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with abundant alternative splicing. Comparative information has 
improved computational gene predictors’, but their accuracy still 
falls far short of well-studied gene catalogues such as the FlyBase 
annotation, which combines computational gene prediction”, 
high-throughput experimental data**** and extensive manual 
curation”. Recognizing this, we set out not only to produce an 
independent computational annotation of protein-coding genes in 
the fly genome, but also to assess and refine its already high-quality 
annotations*’. 

Our analyses of D. melanogaster coding genes are based on two 
independent evolutionary signatures unique to protein-coding 
regions (Fig. 2a): (1) reading frame conservation (RFC)*, which 
observes the tendency of nucleotide insertions and deletions to pre- 
serve the codon reading frame; and (2) codon substitution frequencies 
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Figure 2 | Distinct evolutionary signatures for diverse classes of functional 
elements. a, Protein-coding genes tolerate mutations that preserve the 
amino-acid translation, leading to abundant conservative codon 
substitutions (green). Insertions and deletions are largely constrained to bea 
multiple of three (grey). In contrast, non-coding regions show abundant 
non-conservative triplet substitutions (red), nonsense mutations (blue) and 
frame-shifting insertions and deletions (orange). b, RNA genes tolerate 
mutations that preserve the secondary structure (for example, single 
substitutions involving GeU base pairs and compensatory changes) and 
exclude structure-disrupting mutations. Matching parentheses and 
matching letters of the alphabet indicate paired bases. c, MicroRNA genes, in 
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contrast, generally do not show changes in stem regions, but tolerate 
substitutions in loop regions and flanking unpaired regions, leading to a 
distinctive conservation profile. Asterisks denote the number of informant 
species matching the melanogaster sequence at each position. d, Regulatory 
motifs tolerate local movement and nucleotide substitutions consistent with 
their degeneracy patterns, and show increased conservation across the 
phylogenetic tree, measured as the branch length score (BLS; Supplementary 
Methods 5a). e, Increasing BLS thresholds select for instances of known 
motifs (black) at increasing confidence (red), as the number of conserved 
instances of control motifs (grey) drops significantly faster. 
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Figure 3 | Revisiting the protein-coding gene catalogue and revealing 
unusual gene structures. a, Protein-coding evolutionary signatures 
correlate with annotated protein-coding exons more precisely than the 
overall conservation level (phastCons track**), for example excluding highly 
conserved yet non-coding elements. Asterisk denotes new predicted exon, 
which we validate with cDNA sequencing (see panel c). The height of the 
black tracks indicates protein-coding potential according to evolutionary 
signatures (top) and overall sequence conservation (bottom). Blue and 
green boxes indicate predicted coding exons (top) and the current FlyBase 
annotation (bottom). The region shown represents the central 6 kb of Fig. 1b, 


(CSF, see Supplementary Methods 2a), which observes mutational 
biases towards synonymous codon substitutions and conservative 
amino acid changes, similar to the non-synonymous/synonymous 
substitution ratio K,/Ks** and other methods**“’. 

Assessing and refining existing gene annotations. We first assessed 
the 13,733 euchromatic genes in FlyBase*’ release 4.3. Using the above 
measures, we defined tests that ‘confirmed’ genes supported by the 
evolutionary evidence, ‘rejected’ genes inconsistent with protein-coding 
selection, or ‘abstained’ for genes that were not aligned or with ambigu- 
ous comparative evidence (Supplementary Methods 2a). Of the 4,711 
genes with descriptive names, we confirmed 97%, rejected 1% and 
abstained for 2%, whereas the same criteria applied to 15,000 random 
non-coding regions =300 nucleotides rejected 99% of candidates and 
confirmed virtually none (Table 1). Together, these results illustrate the 
high sensitivity and specificity of our criteria. 

Applying the same criteria to the 9,022 genes lacking a descriptive 
name (genes designated only by a CG identifier, referred to hereafter 
as CGid-only genes), our tests accepted 87%, rejected 5% (414 genes) 
and abstained for 8%. This provides strong evidence that most CGid- 
only genes encode proteins, but also suggests that they may be less 
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rendered by the UCSC genome browser’”*. b, Results of FlyBase curation of 
414 genes rejected by evolutionary signatures (Table 1), and 928 predicted 
new exons. ¢, Experimental validation of predicted new exon from panel 

a. Inverse PCR with primers in the predicted exon (green) results in a full- 
length cDNA clone, confirming the predicted exon and revealing a new 
alternative splice form for CG4495. d, Protein-coding evolution continues 
downstream ofa conserved stop codon in 149 genes, suggesting translational 
readthrough. e, Codon-based evolutionary signatures (CSF score) abruptly 
shift from one reading frame to another within a protein-coding exon, 
suggesting a conserved, ‘programmed’ frameshift. 


constrained” and/or may include incorrect annotations. Indeed, 
on manual review, 222 (54%) of the 414 rejected CGid-only genes 
were re-categorized as non-protein-coding or deleted (of which 55 
were due to genomically primed clones), 73 (18%) were flagged as 
being of uncertain quality, and the remaining 119 (29%) were kept 
unchanged (Fig. 3b). Some of these are probably rapidly evolving 
protein-coding genes, but others may also prove to be non-protein- 
coding genes or spurious; in fact, none of these had any functional 
gene ontology (GO) annotation®. 

In addition, we proposed specific corrections and adjustments 
to hundreds of existing transcript models, including translation 
start site adjustments (Supplementary Fig. 2b), alternative splice 
boundaries (Supplementary Fig. 2b), recent nonsense mutations 
(Supplementary Fig. 2c) and alternative translational reading 
frames’. 

Identifying new genes and exons. To predict new protein-coding 
exons, we integrated our metrics into a probabilistic algorithm that 
determines an optimal segmentation of the genome into protein- 
coding and non-coding regions (Fig. 3a) on the basis of whole- 
genome sequence alignments of the 12 fly species (Supplementary 


Table 1| Assessment of FlyBase euchromatic protein-coding gene annotations 


Regions evaluated Total Confirm Abstain Reject* 

Named genes 4711 4,566 (96.9%) 105 (2.2%) 40 (0.8%) 
CGid-only genes 9,022 7,879 (87.3%) 729 (8.1%) 414 (4.6%) 
Non-coding regionst 15,564 3 (0.0%) 131 (0.8%) 15,430 (99.1%) 


* A minority of rejected genes are false rejections; see Fig. 3b and text for details. 


+ Regions =300 nucleotides in length randomly chosen from the non-coding part of the genome (see Supplementary Methods 2a). 
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Methods 2a). Our genome-wide search predicted 1,193 new protein- 
coding exons, mostly in euchromatic regions annotated as intergenic 
(43%), intronic (26%), or 5'/3’ untranslated region (UTR; 23%) in 
FlyBase annotation release 4.3. 

We manually reviewed 928 of these predictions according to 
FlyBase standards” (Supplementary Methods 2a), leading to 142 
new gene models (incorporating 192 predictions) and 438 revised 
gene models (incorporating 562 predictions) (Fig. 3b). In parallel, we 
tested 184 predictions (126 intergenic, 58 intronic) by directed cDNA 
sequencing using inverse polymerase chain reaction (inverse PCR) of 
circularized full-length clones*?*! (Fig. 3c), which validated 120 tar- 
geted predictions (65%) and an additional 42 predictions not directly 
targeted but contained within the recovered transcripts. Predictions 
in intergenic regions yielded 88 full-length cDNAs, providing evid- 
ence for 50 new genes and modification of 39 gene models. 
Predictions within introns of existing annotations yielded 32 full- 
length cDNAs, of which only 18 (56%) represent new splice variants 
of the surrounding gene, whereas the remaining 14 revealed nested or 
interleaved gene structures. This provides additional evidence that 
such complex gene structures are not rare in Drosophila’. 

Overall, 83% of the 948 predicted exons that we assessed by man- 

ual curation or cDNA sequencing were incorporated into FlyBase, 
resulting in 150 new genes and modifications to hundreds of existing 
gene models. Finally, the 245 predictions that we did not assess were 
in non-coding regions of existing transcript models, or were already 
included in FlyBase independent of our study. In an independent 
analysis’, we predicted 98 new genes on the basis of inferred homo- 
logy to predicted genes in the informant species**, of which 63% 
matched the above predictions. 
Discovering unusual features of protein-coding genes. Our analysis 
also predicted an abundance of unusual protein-coding genes that 
call for follow-up experimental investigation. First, we found open 
reading frames with clear protein-coding signatures and conserved 
start and stop sites on the transcribed strand of annotated UTRs, 
indicative of polycistronic transcripts*°***. These include 73% of 
115 annotated dicistronic transcripts and 135 new candidate cistrons 
of 123 genes (Supplementary Fig. 2b). 

Second, we predicted that 149 genes undergo stop codon readthrough, 
with protein-coding selection continuing past a deeply conserved stop 
codon (Fig. 3d), in some cases for hundreds of amino acids. It is unlikely 
that these genes are selenoproteins, as they appear to lack SECIS elements 
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that direct selenocysteine recoding*>*. Other mechanisms may instead 
be at work, such as regulation of ribosomal release factors”, A-to-I 
editing®*', alternative splicing, or other less-characterized mechan- 
isms”. In fact, these genes are significantly enriched in neuronal proteins 
(P= 10“), which frequently undergo A-to-I editing®. 

Third, we found four genes in which CSF signatures abruptly shift 
from one reading frame to another in the absence of nearby intron— 
exon boundaries or insertions and deletions (Fig. 3e). These are 
suggestive of conserved ‘programmed’ frameshifts™, which are 
thought to be rare in eukaryotes. 

Overall, our results affected over 10% of protein-coding genes, and 
will be available in future releases of FlyBase. They also suggest that 
several types of unusual protein-coding gene structure may be more 
prevalent in the fly than previously appreciated. 


RNA genes and structures 


Several comparative approaches to RNA gene identification have 
been developed®”® that recognize their characteristic properties: 
compensatory double substitutions of paired nucleotides (for 
example, AeU<oCeG), structure-preserving single-nucleotide muta- 
tions involving GeU base pairs (GeU<oGeC and GeUc>AeU), and 
few nucleotide substitutions disrupting functional base pairs 
(Fig. 2b). To predict new structures, we applied EvoFold’ in highly 
conserved segments of the 12 Drosophila species and focused on high- 
stringency candidates with strong support by compensatory changes 
(Supplementary Methods 4). 

Our search led to 394 predictions, recovering 68 known RNA 
structures (primarily transfer RNA genes) in 0.02% of the genome 
(570-fold enrichment). The novel candidates consisted of 177 struc- 
tures in intergenic regions (54%), 103 in introns (32%), 36 in 3’ 
UTRs (11%) and 10 in 5’ UTRs (3%). In addition, we predicted 
200 structures in protein-coding regions (Supplementary Methods 
3). Notably, 75% of 3’ UTR structures and 80% of 5’ UTR structures 
were predicted on the transcribed strand, suggesting that they are 
frequently part of the messenger RNA. In contrast, only 47% of 
intronic structures are on the transcribed strand, suggesting that they 
are largely independent of the surrounding genes. 

Known and novel types of RNA genes. Of the 177 predicted inter- 
genic structures, 30 were detected in a tiling-array expression study”’. 
This fraction (17%) is significantly above that for all conserved 
intergenic regions (12%, P= 0.007), but lower than that of known 
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Figure 4 | Novel RNA structures. a, New exonic RNA structure spanning 78 
of 90 nucleotides of spineless exon 5. b, New intronic RNA structure in 
lodestar shows 11 compensatory substitutions and 10 silent GeU 
substitutions, providing strong evidence of structural selection (colours as in 
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Fig. 2b). c, New 5’ UTR structure that overlaps the translation start site of 
CG6764, the fly orthologue of yeast ribosomal protein RPL24, suggesting a 
potential role in translational regulation. a—c, Structure shown corresponds 
to shaded region in the gene model. 
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intergenic ncRNAs (21%), suggesting that these candidates may be 
of lower abundance, temporally or spatially constrained, or might 
include false positives. Two predictions were expressed throughout 
development, one extending the annotation of a previously reported 
but uncharacterized ncRNA® and the other probably representing a 
novel type of ncRNA. The predictions also included nine novel 
H/ACA-box small nucleolar RNA candidates in introns of ribosomal 
genes, known to frequently contain small nucleolar RNAs that guide 
post-transcriptional base modifications of ncRNAs*”’. 

Likely A-to-I editing structures. Many of the 48 intronic candidates 
on the transcribed strand and many of the 200 hairpins in coding 
sequence are probably involved in A-to-I editing or post-transcriptional 
regulation (Fig. 4a). Hairpins in coding sequence were associated with 
11 of the 157 known editing sites (120-fold enrichment) and both 
intronic and coding-sequence hairpins showed a strong enrichment 
for ion-channel genes (6%, P = 0.007 and 10%, P= 2X 10 |, respect- 
ively), known to be frequent editing targets. Editing is known to occur at 
multiple sites in the same gene®’, and we find an additional 10 hairpins 
in known editing targets, as well as 40 additional hairpins clustered in 18 
genes not previously known to be edited (for example huntingtin®, 
which harbours four predicted hairpins, more than any other gene). 
Intronic predictions also showed the highest abundance of compens- 
atory substitutions: for example, Resistant to dieldrin (Fig. 2b) contained 
a 26-base-pair (bp) intronic hairpin flanked by exons known to be 
edited” with a striking 16 compensatory changes, lodestar showed 
one hairpin with 11 compensatory changes, and Inverted repeat-binding 
protein showed one hairpin with 10 compensatory substitutions 
(Fig. 4b). 

Likely regulatory UTR structures. We predicted 38 structures in 3’ 
UTRs, a density twofold higher than the genomic average, whereas 
fewer than 10 such examples are currently known”’. A considerable 
fraction of these lies in regulatory genes (14 out of 38; P= 10‘), 
including several transcriptional regulators (for example, cas, spen 
and Alh), the tyrosine phosphatase PTP-ER and the translation ini- 
tiation factor e[F3-S8. This suggests that many regulatory genes may 
themselves be regulated post-transcriptionally through these struc- 
tures. 

3’ UTR structures were also enriched for genes involved in mRNA 
localization (3 out of 38, P= 2.7 X 10“), including 0018 RNA-bind- 
ing protein (orb) and staufen (stau), both of which contain double- 
stranded RNA-binding domains, are involved in axis specification 
during oogenesis, and interact with the mRNA of maternal effect 
protein oskar. The hairpin in orb is known to be important for 
mRNA transport and localization’’, whereas the highly similar stau 
hairpin has not been previously described to our knowledge. 

The ten structures found in 5’ UTRs probably contain binding 

sites for factors that regulate translation. For example, the fly homo- 
logue of yeast ribosomal protein RPL24 contains a hairpin structure 
overlapping its start codon (Fig. 4c). This is interesting in light of 
high conservation upstream of the start codon in yeast ribosomal 
proteins**, and findings that ribosomal proteins bind to their 
mRNAs and control translation in prokaryotes””’. 
Conserved RNA structures in roX2 recruit MSL. In an independent 
study”*, we searched for conserved regions in the non-coding roX1 
and roX2 (RNA on the X) genes to gain insights into their function. 
Both RNAs are components of the MSL (Male-specific lethal) com- 
plex and are crucial for dosage compensation in male flies, inducing 
lysine 16 acetylation of histone H4, leading to upregulation of hun- 
dreds of genes on the X chromosome”. We identified several stem- 
loop structures with repeated sequence motifs (for example, 
GUUNUACG), and found that tandem repeats of one of these were 
sufficient to recruit MSL complexes to the X chromosome and to 
induce acetylation of lysine 16 of histone H4. Although this structure 
could not fully rescue roX-deficient males, our results suggest that it 
mediates MSL recruitment during roX2-dependent chromatin modi- 
fication and dosage compensation, illustrating the power of evolu- 
tionary evidence for directing experimental studies. 
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Prediction and characterization of miRNA genes 


Focusing on specific classes of RNA genes markedly increases the 
accuracy of RNA gene prediction, reviewed in refs 35, 76 and illu- 
strated here for Drosophila miRNA genes. The common biogenesis 
and function of miRNAs” lead to evolutionary and structural signa- 
tures (Fig. 2c) that can be used for their systematic de novo 
discovery*"''. Using such signatures in the 12 fly genomes 
(Supplementary Methods 4a, b), we predicted 101 miRNAs” 
(Supplementary Table 4d), which include 60 of the 74 verified 
Rfam miRNAs (81%), while spanning less than 0.006% of the fly 
genome (13,500-fold nucleotide enrichment). 

Comparison of our predictions with high-throughput sequencing 
data of short RNA libraries from different stages and tissues of D. 
melanogaster’*”” revealed that 84 of the 101 predictions (83%), 
including 24 of the 41 novel predictions (59%), were authentic 
miRNA genes (Fig. 5a and Supplementary Table 4d). An independent 
computational method” had 20 of its 45 novel predictions validated 
when used across six Drosophila species. Additional candidates may 
represent genuine miRNAs whose temporal or spatial expression 
pattern does not overlap with the surveyed libraries. 

Several of the validated miRNAs were on the transcribed strand of 
introns or clustered with other miRNAs. For example, mir-11 and mir- 
998 (the vertebrate homologue of which, mir-29, has been implicated 
in cancer*’) were both found in the last intron of E2f, and might be 
involved in cell-cycle regulation (Fig. 5b). Notably, two predictions 
overlapped exons of previously annotated protein-coding genes that 
were independently rejected above (Fig. 5c), providing an explanation 
for the previously observed transcripts of these annotations and high- 
lighting the importance of specific signatures for genome annotation. 

High-throughput sequencing data discovered an additional 50 
miRNAs not found computationally””*’, thereby illustrating the lim- 
itations of purely computational approaches. Some of these had 
precursor structures not seen previously for animal miRNAs, includ- 
ing unusually long hairpins” and hairpins corresponding to short 
introns (mirtrons)*’*’. The remaining were often less broadly con- 
served or showed unusual conservation properties. 

Signatures for mature miRNA annotation. The exact position of 5’ 
cleavage of mature miRNAs is important, because it dictates the core 
of the target recognition sequence***’. This leads to unique structural 
and evolutionary signatures, including direct signals, present at the 5’ 
cleavage site, and indirect signals, stemming from the relationship of 
miRNAs with their target genes (Supplementary Methods 4a, c). 
Combined into a computational framework”’, these signatures pre- 
dicted the exact start position in 47 of the 60 cloned Rfam miRNAs 
(78%), and were within 1 bp in 51 cases (85%). The method dis- 
agreed with the previous annotation in 9 of the 14 Rfam miRNAs 
that were not previously cloned, of which 6 were confirmed by 
sequencing reads’*”®, leading to marked changes in the inferred target 
spectrum (Fig. 5d). Prediction accuracy was significantly lower (41% 
exact, 61% within 1 nucleotide) for novel miRNAs, which, however, 
also showed less accurate processing in vivo’*”’. 

New insights into miRNA function and biogenesis. We predicted 
targets for all conserved miRNAs identified by high-throughput 
sequencing” searching for conserved matches to the seed region 
(similar to ref. 86) evaluated using the branch length score (Supple- 
mentary Methods 5a), a new scoring scheme described below. Whereas 
the resulting miRNA targeting network changed substantially”, we 
found that the novel and revised miRNAs shared many of their pre- 
dicted targets with previously known miRNAs, resulting in a denser 
network with increased potential for combinatorial regulation’”*”’. 

For ten miRNA hairpins, the mature miRNA and the correspond- 
ing miRNA star sequence (miRNA, the small RNA from the oppos- 
ite arm of the hairpin) both appeared to be functional: both reached 
high computational scores and were frequently sequenced’*”, often 
exceeding the abundance of many mature miRNAs (Supplementary 
Table 4e). The Hox miRNA mir-10 showed a particularly striking 
example of a functional star sequence (Fig. 5e): both arms showed 
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abundant reads, high scores and highly conserved Hox gene tar- 
gets’*”’, suggesting a key role in Hox regulation. 

In addition, for 20 miRNA loci, the anti-sense strand also folded 
into a high-scoring hairpin suggestive of a functional miRNA” 
(Supplementary Table 4f). Indeed, sequencing reads confirmed that 
four of these anti-sense hairpins are processed into small RNAs 
in vivo”. Thus, a single genomic miRNA locus may produce up to 
four miRNAs, each with distinct targets. 


Regulatory motif discovery and characterization 


Regulatory motifs recognized by proteins and RNAs to control gene 
expression have been difficult to identify due to their short length, 


ARTICLES 


their many weakly specified positions, and the varying distances 
at which they can act®*”**. Recent studies have shown that compar- 
ative genomics of a small number of species can be used for motif 
discovery**'?"*, on the basis of hundreds of conserved instances 
across the genome (Fig. 2d). Many related genomes should lead to 
increased discovery power, but also pose new challenges, arising from 
sequencing, assembly, or alignment artefacts, and from movement or 
loss of motif instances in individual species. 

To account for the unique properties of regulatory motifs, we 
developed a phylogenetic framework to assess the conservation of 
each motif instance across many genomes’’. Briefly, we searched for 
motif instances in each of the aligned genomes, and based on the set 
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Figure 5 | MicroRNA gene identification and functional implications. 

a, New predicted miRNA (mir-190) and its validation by sequencing reads. 
Total read counts for mature miRNA (red) and miRNA* (blue) show a 
characteristic pattern of processing indicative of miRNAs. Highlighted 
regions indicate most abundant processing products. b, Example of 
clustered known (mir-11) and new (mir-998) miRNAs in the intron of cell- 
cycle regulator E2f. c, Example of a new miRNA (mir-996) in the transcript of 
a spurious gene. CG31044 was rejected by our protein-coding analysis, its 
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transcript probably representing the precursor of mir-996, with no protein- 
coding function. d, Revisions to the 5’ end of miR-274 and miR-263a are 
proposed on the basis of evolutionary evidence (for example, 7mer seed 
conservation; black curve) and confirmed by sequencing reads. Changes at 
the 5’ end of more than one nucleotide results in marked changes to the 
predicted target spectra (venn diagrams). e, Evidence from evolutionary 
signals (mature score), sequencing reads and target predictions suggests that 
both miR-10 and miR-10* are functional, each targeting distinct Hox genes. 
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of species that contained them, we evaluated the total branch length 
over which the D. melanogaster motif instance appears to be con- 
served (Supplementary Methods 5a, b), which we call the branch 
length score (BLS). We used BLS for the discovery of novel motifs 
(this section) and for the prediction of individual functional motif 
instances (next section). 

Predicted motifs recover known regulators. To discover motifs, we 
estimated the conservation level of candidate sequence patterns with 
a motif excess conservation (MEC) score compared to overall con- 
servation levels in promoters, UTRs, introns, protein-coding exons 
and intergenic regions (Supplementary Methods 5a). 

Our search in regions with roles in pre-transcriptional regulation 
resulted in 145 distinct motifs (Table 2), obtained by collapsing var- 
iants across 83 motifs discovered in promoters, 35 in enhancers, 20 in 
5’ UTRs, 35 in core promoters, 30 in introns and 84 in the remaining 
intergenic regions. Motifs discovered in each region showed similar 
properties and large overlap: 66 (46%) were discovered independently 
in at least two regions and 40 (28%) in at least three, consistent with 
shared regulatory elements in these regions”. 

The 145 discovered motifs match 40 (46%) of the 87 known tran- 
scription factors in Drosophila (Supplementary Table 5c) compared 
to 8% expected at random (P= 1X 10 7°). Several of the non- 
discovered known motifs are involved in early anterior—posterior 
segmentation of the embryo, consistent with reports that they are 
largely non-conserved”’; indeed, 74% of these did not exceed 
the conservation expected by chance in promoter regions. Other 


Table 2 | Pre-transcriptional motifs 
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non-discovered motifs often lacked characteristics expected for tran- 
scription factor motifs, suggesting that some may be spurious: 49% 
were unusually long (>10 nucleotides) compared to 23% of reco- 
vered ones, and showed only one or a few total instances genome- 
wide, suggestive of individual regulatory sites rather than motifs. 
Tissue-specific and functional enrichment of novel motifs. The 
discovered motifs showed strong signals with respect to embryonic 
expression patterns (Fig. 6a). Overall, 75 (52%) were either enriched 
or depleted in genes expressed in at least one tissue, compared to 59% 
of known motifs and 3% of random controls. Motif depletion may 
represent either specific repressors for individual tissues, or activators 
excluded from these tissues. Motif depletion was found more gen- 
erally in ubiquitously expressed genes (30% of discovered and 34% of 
known motifs compared with 1% expected at random), similar to 
findings for in vivo binding sites”, and probably reflecting less com- 
plex regulation. We also found significant motif enrichment in groups 
of genetically interacting genes (collected by FlyBase) that often func- 
tion in common developmental contexts or signalling pathways, 
genes of metabolic pathways (Kyoto Encyclopedia of Genes and 
Genomes, KEGG”*), and genes with shared functions (GO). 

In total, 68% of discovered and 70% of known motifs were 
enriched or depleted in one of the functional categories (14% ran- 
dom). Noteworthy examples include motif ME93 (GCAACA), which 
was more highly enriched in neuroblasts (P = 4 X 107 '”) than either 
of the two well-known regulators of neuroblast development, pros- 
pero and asense (P= 4 X 10~° and 2 X 107’, respectively). Similarly, 


Name Motif consensus MEC MCS Region* Known Multiplicity maGO enrichment§ ImaGO 
transcription factort scoret scores 

EL GTCACGTD 0.448 45.41 PIG = = - = 
E AWNTGGGTCA 0.393 26.97 PIG Hr46 - Oesophagus (13-16) 4.52 
E BCATAAATYA 0.369 36.02 PCEIG Caudal - Ubiquitous (13-16) —6.22 
E4 HAATTAYGCRH 0.365 32.71 PCE5IG Engrailed = - - 
E STATAWAWR 0.358 24.31 C TATA - Ventral nerve cord (13-16) —5.1 
E VATTWGCAT 0.356 44.06 PES5IG - 3.13 Ubiquitous (11-12) =7:15 
E7 BYAATTARH 0.338 15.45 PCE5IG Engrailed 7.08 Ubiquitous (11-12) —10.26 
E HRTCAATCA 0.338 42.32 PIG - - Dorsal pharyngeal muscle PR (11-12) —4.15 
E9 TGACANNNNNNTGACA 0.336 9 G = = - - 
E10 RCGTGNNNNGCAT 0.329 15.94 PIG = - - - 
E11 MATTAAWNATGCR 0.324 12.43 PIG acj6 = Tracheal PR (11-12) 4.11 
E12 TTAATGATG 0.32 20.31 PG = = - = 
E13 WTGACANBT 0.318 63.45 PESIG - 4.14 Ubiquitous (13-16) =3.9/ 
E14 YGACMTTGA 0.313 27.06 PIG - - Midgut (13-16) 4.32 
E15 AATTRNNNNCAATT 0.309 27: PG = - - - 
E16 TGACGTCAT 0.304 12.24 PC5IG CrebA = = - 
Els MAATTNAATT 0.304 5157 PESIG = = Ubiquitous (11-12) —6.66 
E18 MRYTTCCGYY 0.304 39.04 PEIG Dorsal = Ubiquitous (11-12) —44 
E19 MATTRRCACNY 0.303 25.24 PIG - - a - 
E20 YTAATGAVS 0.298 44.5 PEIG = - Foregut PR (11-12) 4.19 
E21 TAATTRANNTTNATG 0.294 8.67 G = - - = 
E22 WAATGCGCNT 0.291 8.17 G = = - = 
E23. MATTWRTCA 0.288 46.25 PEIG - - Dorsal epidermis PR (11-12) 44 
E24 YAATTWNRYGC 0.287 30.91 PG = 4.27 Ubiquitous (11-12) -4.79 
E25 TTAYGTAA 0.283 3.06 5 Giant - Midgut (13-16) 5.32. 
E26 YGCGTHAATTR 0.283 3.61 PEG = - - - 
E27 AATTRYGWCA 0.28 22.85 PEIG = = Pericardial cell (13-16) 41 
E28 GCGCATGH 0.28 30.17 PCEG = = Ventral nerve cord PR (11-12) 5.15 
E29 WAATCARCGC 0.275 3.82 G = = = = 
E30  AATTAANNNNNCATNA 0.271 6.44 G Antennapedia - - - 
E31 GCGTSAAA 0.271 29.95 PG - - - - 
E32 YGCGYRTCAWT 0.269 2.87 G = - - - 
E33 GCGTTGAYA 0.269 ml PG - a = - 
E34 AAATKKCATTA 0.266 4.04 PG = = - = 
E35 RACASCTGY 0.266 28.38 PCEG Scute - Ventral sensory complex SA (11-12) 4.08 
E36 TGTCAATTG 0.265 2.65 PG - = Tracheal system (13-16) 4.56 
E37 WAATKNNNNNCRCGY 0.261 23.34 PEG = - - - 
E38 CASGTAR 0.261 9.24 PEG Single-minded 4.58 Ventral epidermis PR (11-12) 741 
E39 WCACGTGC 0.26 0.54 PCE5IG Enhancer of split - - - 
E40 CATTANNNWAATT 0.259 9.02 G = = - = 

The top 40 of 145 are shown. MEC, motif excess conservation; MCS, motif conservation score. See Supplementary Table 5c for the full table. 

* Region where the motif was found: P, promoter, C, core promoter; E, enhancers; 5, 5’ UTR; |, intron; G, intergenic genome. 

+ The known transcription factor motif matching the consensus sequence. 

£A multiplicity score is reported for motifs with many repeated occurrences. 

§ Tissue where motif is most strongly enriched or depleted, and corresponding score (positive, enrichment; negative, depletion). PR, primordium; SA, specific anlage. 
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motifs ME89 (CACRCAC), ME11 (MATTAAWNATGCR) and 
ME117 (MAAMNNCAA) were highly enriched in malpighian tubule 
(P=4%X 10’), trachea (P=4 X10 °) and surface glia (6 X 10”), 
respectively, in each case ranking above motifs for factors known to 
be important in these tissues (Supplementary Table 5c). These pre- 
sumably correspond to as-yet-unknown regulators for these tissues. 
Exclusion, clustering and positional constraints. A large number of 
motifs were depleted in coding sequence (57% of discovered versus 
57% of known and 10% of random motifs, P= 3 X 10 '%) and in 3’ 
UTRs (30% versus 22% and 0%, P= 4 X 10 ''), suggesting specific 
exclusion similar to in vivo binding”. 

Many of the intergenic or intronic instances occurred in clusters, a 
property of motifs that has been used to identify enhancer ele- 
ments’"**°°, We assessed increased conservation of motifs when 
found near other instances of the same motif (whether conserved 
or not, to correct for regional conservation biases), and found sig- 
nificant multiplicity for 19% of the discovered motifs (compared to 
24% of known and 4% of random motifs). 

In addition, 15 of the discovered motifs (10%) were significantly 
enriched near transcription start sites (compared to 14% of known 
and 1% of random motifs). Several were enriched at precise positions 
and preferred orientations (Fig. 6b), including close matches to 
several known core promoter motifs involved in transcription 


b STATAWAWR (TATA box) 
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initiation’. For example, ME5 (STATAWAWR), which matches 
the TATA-box motif, displayed a sharp peak on the transcribed 
strand, 27 nucleotides upstream of the transcription start site. 
Similarly, ME120 (TCAGTT), corresponding to the known initiator 
motif (Inr) strongly peaked directly on the transcription start site, 
and ME54 (RCGYRCGY), which matches a known downstream pro- 
moter element (DPE), peaked 30 nucleotides downstream of the 
transcription start site. 

Regulatory motifs involved in post-transcriptional regulation. We 
also used BLS/MEC to discover motifs involved in post-transcriptional 
regulation, and developed methods to distinguish motifs acting at the 
DNA level, motifs acting at the RNA level and motifs stemming from 
protein-coding codon biases (Supplementary Methods 5a). Motifs act- 
ing post-transcriptionally at the RNA level generally showed highly 
asymmetric conservation’, as functional instances can only occur on 
the transcribed strand. Indeed, 71 of 90 motifs (79%) discovered in 3’ 
UTRs showed strand-specific conservation (compared with only 3% of 
5’ UTR motifs and 5% of intron motifs, suggesting that these act 
primarily in pre-transcriptional regulation). 

Overall, 33 motifs discovered in 3’ UTRs were complementary to 
the 5’ end of Rfam miRNAs, recovering 72% of known miRNAs 
(68% of 5’ unique miRNA families). An additional 21 motifs 
matched to 5’ ends of novel miRNAs predicted above, of which 12 
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line). c, Coding regions show reading-frame-invariant conservation for 
miRNA motifs (red) and reading-frame-biased conservation for protein 
motifs (grey). MEC scores are evaluated for each of the three reading frame 
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average MEC for all miRNA motifs and 500 top-scoring protein-coding 
motifs (based on MEC without frame correction). d, Motif excess 
conservation (MEC) of 7mer complements at different offsets with respect to 
miRNA 5’ end, averaged across all Rfam miRNAs. MEC scores evaluated in 
protein-coding regions and 3’ UTRs show a highly similar profile 
(correlation coefficient 0.96), suggesting similar evolutionary constraints. 
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were validated experimentally”*”’, and 3 motifs matched uniquely to 


miRNA star sequences, all of which were abundantly expressed in 
vivo (Supplementary Table 4e). 

We found 33 additional motifs in 3’ UTRs that were apparently 

not associated with miRNAs. MO40 (TGTANWTW) closely matches 
the Puf-family Pumilio motif’*. MO32 (AATAAA) corresponds to 
the polyadenylation signal and displays both very strong conser- 
vation and a sharply defined distance preference with respect to the 
end of the annotated 3’ UTR (P= 10°”). Finally, several motifs (for 
example, MO24 = TAATTTAT; MO94 = TTATTTT) are variants of 
known AU-rich elements, which are known to mediate mRNA 
instability and degradation”. 
MicroRNA targeting in protein-coding regions. Protein-coding 
regions can also harbour functional regulatory motifs, such as exonic 
splicing regulatory elements’*’. However, motif conservation is dif- 
ficult to assess within protein-coding regions because of the overlap- 
ping selective pressures. Indeed, the most highly conserved 
nucleotide sequence patterns of length seven (7mers) in coding 
sequence showed strong reading-frame-biased conservation, sug- 
gesting that they reflect protein-coding constraints rather than reg- 
ulatory roles at the DNA or RNA level (Fig. 6c). 

MicroRNA motifs, which function at the RNA level, instead 
showed high conservation in all three reading frames, suggesting that 
they are specifically selected within coding regions for their RNA- 
level function. Indeed, previous studies have shown that miRNA 
motifs in coding regions are preferentially conserved in vertebrates*®, 
that they can lead to repression in experimental assays'®’’’, and that 
they are avoided in genes co-expressed with the miRNA’. Frame- 
invariant conservation allows us to demonstrate the coding-region 
targeting of individual miRNAs, and also enables the de novo discov- 
ery of miRNA motifs in coding regions. Using frame-invariant 
conservation, we recovered 11 miRNA motifs within the top 20 
coding-region motifs (Supplementary Table 5g), whereas using over- 
all conservation required several hundred candidates to recover 11 
miRNA motifs. 

Moreover, 7mers complementary to different positions in the 
mature miRNA show a distinctive conservation pattern indicative 
of functional targeting in coding regions (Fig. 6d) and similar to that 
found in 3’ UTRs'*** (correlation coefficient 0.96). Finally, 6mers 
complementary to miRNA 5’ ends were depleted in coding exons of 
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anti-target genes (Supplementary Fig. 5f), similar to findings for 
these genes’ 3’ UTRs'**’*. Overall, these results, together with find- 
ings in vertebrates**'°'”, suggest that important miRNA targets 
have been overlooked by many target prediction methods’ that have 
traditionally focused exclusively on 3’ UTR sequences. 


Prediction of individual regulator binding sites 

Previous methods for regulatory motif discovery integrated 
conservation information over hundreds of motif instances across 
the genome, leading to an exceedingly clear signal for motif discovery 
even if many of these instances are only marginally conserved. In 
contrast, the reliable identification of individual motif instances 
has been hampered by lack of neutral divergence and would require 
many related genomes'*’. In the absence of such data, previous 
studies have relied on motif clustering’'”* °° or other sequence char- 
acteristics'®° to predict regulatory targets or regions. 

With the availability of the 12 fly genomes, we inferred high-con- 

fidence instances of regulatory motifs by mapping the BLS of each 
motif instance to a confidence value (Supplementary Methods 5a). 
This value represents the probability that a motif instance is func- 
tional, on the basis of the conservation level of appropriate control 
motifs evaluated in the same type of region (promoter, 3’ UTR, 
coding, and so on). Because the number of conserved instances 
decreases much more rapidly for control motifs than for real motifs, 
the many genomes allowed us to reach high confidence values for 
many transcription factors and miRNAs, even at relatively modest 
BLS thresholds (Fig. 2e). 
Conserved motif instances identify functional in vivo targets. We 
found that increasing confidence levels selected for functional 
instances for both transcription factor and miRNA motifs: the nor- 
malized fraction of transcription factor motif instances within pro- 
moter regions rose from 20% to 90%; that of miRNA motif instances 
within 3’ UTRs rose from 20% to 90%; and the fraction of miRNA 
motif instances on the transcribed strand of 3’ UTRs rose from 50% 
(uniform) to 100% (Fig. 7a); in each case selecting the regions and 
strands where the motifs are known to be functional. 

We further assessed how predicted motif instances compared 
with in vivo targets in promoter regions, defined experimentally 
(without comparative information). We used a set of high- 
confidence direct CrebA targets’ and three genome-wide chromatin 
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illustrates the high sensitivity of the BLS approach. d, Comparison of 
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chromatin immunoprecipitation (ChIP) and conservation in their ability to 
identify functional motif instances. Motif instances that are both ChIP- 
bound and conserved (purple) show the strongest functional enrichment in 
muscle genes for Mef2 and Twist (depletion for Snail), whereas motif 
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the two approaches perform comparably. Even the sites recovered by 
conservation alone outside bound regions (pink) show enrichment levels 
comparable to ChIP, suggesting that they are also functional. 
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immunoprecipitation (ChIP) data sets for Snail, Mef2 and 
Twist?”!°!, and in each case found that the enrichment between 
conserved motif instances and known in vivo regions increased 
sharply for increasing confidence values (Fig. 7b). 

We also found that a large fraction of motif instances in experi- 

mentally determined target regions was conserved (Fig. 7c): 76% of 
motif instances in direct CrebA targets and 90% of motif instances in 
experimentally supported miRNA targets'*’''® were recovered at 
60% confidence. Although many of the miRNA targets stem from 
comparative predictions and are expected to be well conserved, their 
high recovery rate illustrates the increased sensitivity of the BLS 
measure compared to perfect conservation (Supplementary Fig. 
7d). Similar results were found for motifs in known enhancers that 
were determined to be bound by ChIP (‘ChIP-bound’): 65% of Mef2 
motifs, 65% of Snail motifs and 25% of Twist motifs were conserved 
(Fig. 7c). 
ChIP-determined and conservation-determined targets show 
similar enrichment. To determine whether ChIP-bound motifs that 
lack conservation are biologically meaningful, we studied their 
enrichment in muscle gene promoters. We found that motifs that 
were both bound and evolutionarily conserved showed very strong 
correlation with muscle genes for all three factors: Mef2 showed 
eightfold enrichment, Twist showed sevenfold enrichment and 
Snail, a mesodermal repressor, showed threefold depletion for 
muscle genes. However, when only non-conserved sites were con- 
sidered, the correlation dropped significantly to 1—-2-fold for all three 
factors, suggesting that non-conserved ChIP-bound sites may be of 
decreased biological significance (Fig. 7d). 

We also used the correlation with muscle genes to compare ChIP- 
on-chip and evolutionary conservation as two complementary meth- 
ods for target identification (Fig. 7d). We found that the enrichment of 
conservation-inferred targets was consistently higher than the enrich- 
ment of ChIP-inferred targets for each of the three factors. Finally, we 
assessed the functional significance of motif instances that were only 
found by the conservation approach, specifically excluding those in 
ChIP-bound regions, and found that these were also enriched in the 
same functional categories as ChIP-bound sites with comparable or 
higher functional correlations (Fig. 7d). This suggests that the addi- 
tional conserved instances are indeed functional, probably reflecting 
the higher coverage of conservation-based approaches, which are not 
restricted to the experimental conditions surveyed, or that they may be 
bound in vivo yet missed by ChIP-on-chip technology!!””. 

In an independent study''* we compared several strategies for the 
prediction of motif instances and cis-regulatory modules and found 
that using the 12 fly genomes led to substantial improvements. In 
another study, we reported the recovery of conserved motifs for 


ARTICLES 


several known regulators, including Suppressor of Hairless, in genes 
of the Enhancer of split complex". 

A regulatory network of D. melanogaster at 60% confidence. 
Having established the accuracy of conserved motif instances, we 
present an initial regulatory network for D. melanogaster at 60% 
confidence (Supplementary Fig. 5i), containing 46,525 regulatory 
connections between 67 transcription factors and 8,287 genes, and 
3,662 connections between 81 cloned miRNAs (clustered in 49 fam- 
ilies with unique seed sequences) and 2,003 genes. 

The distribution of predicted sites per target gene is highly non- 
uniform and indicative of varying levels of regulatory control. Genes 
with the highest number of sites appeared to be enriched in morpho- 
genesis, organogenesis, neurogenesis and a variety of tissues, whereas 
ubiquitously expressed genes and maternal genes with housekeeping 
functions had the fewest sites'®*. Interestingly, transcription factors 
appeared to be more heavily targeted than other genes, both by tran- 
scription factors (10 sites versus 5.5 on average, P= 107 '°) and by 
miRNAs (2.3 versus 1.8 miRNAs, P=5 X 10 °). Moreover, genes 
with many transcription factor sites also had many miRNA sites, and 
conversely, genes with few transcription factor sites also had few 
miRNA sites (P= 10-* and P=7 X 107°, respectively). 

Several of the predicted regulatory connections have independent 
experimental support (Supplementary Table 5h), including direct 
regulation of achaete by Hairy’, of giant by Bicoid''®, of Enhancer 
of split complex genes by Suppressor of Hairless'’’, and of bagpipe by 
Tinman (known to cooperate in mesoderm induction and heart 
specification’’*). More generally, when tissue-specific expression 
data were available, we found that on average 46% of all targets were 
co-expressed with their factor in at least one tissue (Supplementary 
Fig. 5i), which is significantly higher than expected by chance 
(P=2X10 °). 


Scaling of comparative genomics power 

Theoretical considerations and pilot studies on selected genomic 
regions showed that the discovery power of comparative methods 
scales with the number and phylogenetic distance of the species 
compared! 119-0. We extended these analyses by investigating 
the scaling of genome-wide discovery power using evolutionary sig- 
natures for each class of functional elements (Fig. 8), on the basis of 
the recovery of known elements using different subsets of informant 
species (at a fixed stringency). 

We found that recovery consistently increased with the total num- 
ber of informant species, and that multi-species comparisons out- 
performed pairwise comparisons within the same phylogenetic clade. 
When we examined subsets of informants with similar total branch 
length (for example, several close species versus one distant species), 
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Figure 8 | Scaling of discovery power with the number and distance of 
informant species. a, Discriminatory power of CSF protein-coding 
evolutionary metric for varying exon lengths and using different numbers of 
informant species. Sensitivity is shown for known exons at a fixed false- 
positive rate based on random non-coding regions. Mean length is shown for 
each exon length quantile. Multi-species comparisons increase discovery 
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power, especially among short exons. b, Recovery of known ncRNAs (among 
the top 100 predictions) for pairwise (blue) and multi-species (red) 
comparisons. ¢, Recovery of cloned miRNAs (among the top 100 
predictions). d, Recovery of transcription factor and miRNA motifs with 
instances at 60% confidence. 
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multi-species comparisons sometimes performed better (protein- 
coding exons, ncRNAs), comparably (motifs), or worse (miRNAs) 
than pairwise comparisons. This complex relationship between total 
branch length and actual discovery power probably reflects imperfect 
genome assemblies/alignments, characteristics of each class of func- 
tional elements, and the specific methods we used. For example, 
ncRNA discovery probably benefits from observing more compensatory 
changes across more genomes, whereas miRNA discovery may be more 
sensitive to artefacts in low-coverage genomes, given the expected high 
conservation of miRNA arms. 

As expected, longer elements were easier to discover than shorter 
elements. Long protein-coding exons (>300 nucleotides) were 
recovered at very high rates even with few species at close distances 
(leaving little room for improvement with additional species). In 
contrast, more informant species and larger distances were crucial 
for recovering short exons, miRNAs and regulatory motifs. 

Notably, the optimal evolutionary distance for pairwise compar- 
isons to D. melanogaster also seemed to depend on element length: for 
long protein-coding exons, the best pairwise informant was the clo- 
sely related D. erecta, for exons of intermediate lengths D. ananassae, 
and for the shortest exons the distant D. willistoni (Supplementary 
Table 7a). Distant species were also optimal for other classes of short 
elements (ncRNAs, miRNAs and motifs, Fig. 8b—d). This suggests 
that a small number of species at close evolutionary distances may 
generally allow the discovery of long elements, possibly including 
clade-specific elements, whereas short clade-specific elements may 
not be reliably detectable without many genomes at close distances. 

Finally, we investigated the effect of alignment choice on our 
results (Supplementary Fig. 8). We found high similarity between 
different alignment strategies for longer elements (>93% agreement 
for exons), whereas shorter elements showed larger discrepancies 
between alignments (81% and 59% agreement for miRNA and motif 
instances, respectively). 

Although factors such as genome size, repeat density, pseudogene 
abundance and physiological differences might confound a simple 
analogy to the vertebrate phylogeny based on neutral branch length 
(Fig. 1c), our results suggest that comparisons spanning marsupials, 
birds and reptiles may prove surprisingly useful for biological signal 
discovery in the human genome. 


Discussion 


Our results demonstrate the potential of comparative genomics for 
the systematic characterization of functional elements in a complete 
genome. Even ina species as intensely studied as D. melanogaster, our 
methods predicted several thousand new functional elements, 
including protein-coding genes and exons, novel RNA genes and 
structures, miRNA genes, regulatory motifs, and regulator targets. 
Our novel predictions have overwhelming statistical support, often 
surpassing that of known functional elements, and are additionally 
supported by experimental evidence in hundreds of cases. The com- 
mon underlying methodology in this study has been the recognition 
of specific evolutionary signatures associated with each class of func- 
tional elements, which can be much more informative for genome 
annotation than overall measures of nucleotide conservation. These 
signatures are general and are immediately relevant to the analysis of 
the human genome and more generally of any species. 

In addition to the many new elements, we gained specific bio- 
logical insights and formulated hypotheses that we hope will guide 
follow-up experiments. We found 149 genes with potential trans- 
lational readthrough, showing protein-like evolution downstream 
of a highly conserved stop codon, and possibly encoding additional 
protein domains or peptides specific to certain developmental con- 
texts. We also found several candidate programmed frameshifts, 
which might be part of regulatory circuits (as for ODC/Oda ™) or 
help expand the diversity of protein products generated from one 
mRNA, similar to their role in prokaryotes'”'. We also presented 
evidence of miRNA processing from both arms of a miRNA hairpin 
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and from both DNA strands of a miRNA locus in some cases, poten- 
tially leading to as many as four functional miRNAs per locus. As 
miRNA/miRNA* pairs are expressed from a single precursor and 
thus co-regulated, whereas sense/anti-sense pairs are expressed from 
distinct promoters, the use of both arms or both strands provides 
compelling general building blocks for higher-level miRNA- 
mediated regulation. 

The newly discovered elements did not dramatically increase the 
total number of annotated nucleotides. Known and predicted ele- 
ments explain 42% of nucleotides in phastCons elements*’, com- 
pared to 35.5% for previous annotations (Supplementary Fig. 6), 
an 18% increase (mostly owing to conserved motif instances). The 
remaining phastCons elements and independent estimates based on 
transcriptional activity’ would suggest that a much higher fraction 
of the genome may be functional (Supplementary Fig. 6). Although it 
is possible that these estimates are artificially high and that we are in 
fact converging on a complete annotation of the fly genome, they 
might instead indicate that much remains to be discovered, which 
may require the recognition of as-yet-unknown classes of functional 
elements with distinct evolutionary signatures. 

Our results also allowed us to compare and contrast evolutionary 
and experimental methods for the recovery of functional elements, 
particularly for the identification of regulator targets. We found that 
comparative genomics resulted in many functionally meaningful 
sites for transcription factors Mef2, Twist and Snail outside ChIP- 
bound regions, probably representing targets from diverse condi- 
tions not surveyed experimentally. Similarly, ChIP resulted in many 
additional sites outside those recovered by comparative genomics: 
some of these may have been replaced by functionally equivalent 
non-orthologous sequence, rendering them apparently non-conserved 
in sequence alignments'*’; others may have species- or lineage- 
specific roles, thus lacking sufficient signal for their comparative detec- 
tion; finally, some bound sites may be biochemically active yet selec- 
tively neutral”. It is worth noting, however, that ChIP-bound motifs 
that were not conserved showed decreased enrichment in muscle/ 
mesoderm development where the factors are known to act, suggesting 
that potential lineage-specific roles may lie outside the regulators’ 
conserved functions. To resolve these questions, comparative geno- 
mics studies would benefit greatly from experimental studies in several 
related species in parallel. 

Overall, comparative genomics and species-specific experimental 
studies provide complementary approaches to biological signal dis- 
covery. Comparative studies help pinpoint evolutionarily selected 
functional elements across diverse conditions, whereas experimental 
studies reveal stage- and tissue-specific information, as well as spe- 
cies-specific sites. Ultimately, their integration is a necessary step 
towards a comprehensive understanding of animal genomes. 


METHODS SUMMARY 


The Methods are described in Supplementary Information, with more details 
found in the cited companion papers for each section. The sections of the 
Supplementary Methods are arranged in the same order as the manuscript to 
facilitate cross-referencing, with an index on the first page to aid navigation. 
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Constraint and turnover in sex-biased gene 
expression in the genus Drosophila 


Yu Zhang'*, David Sturgill'*, Michael Parisi'+, Sudhir Kumar? & Brian Oliver’ 


Both genome content and deployment contribute to phenotypic 
differences between species’°. Sex is the most important differ- 
ence between individuals in a species and has long been posited to 
be rapidly evolving. Indeed, in the Drosophila genus, traits such as 
sperm length, genitalia, and gonad size are the most obvious dif- 
ferences between species’. Comparative analysis of sex-biased 
expression should deepen our understanding of the relationship 
between genome content and deployment during evolution. Using 
existing”* and newly assembled genomes’, we designed species- 
specific microarrays to examine sex-biased expression of ortholo- 
gues and species-restricted genes in D. melanogaster, D. simulans, 
D. yakuba, D. ananassae, D. pseudoobscura, D. virilisand D. moja- 
vensis. We show that averaged sex-biased expression changes accu- 
mulate monotonically over time within the genus. However, 
different genes contribute to expression variance within species 
groups compared to between groups. We observed greater turn- 
over of species-restricted genes with male-biased expression, indi- 
cating that gene formation and extinction may play a significant 
part in species differences. Genes with male-biased expression also 
show the greatest expression and DNA sequence divergence. This 
higher divergence and turnover of genes with male-biased expres- 
sion may be due to high transcription rates in the male germline, 
greater functional pleiotropy of genes expressed in females, and/or 
sexual competition. 

There are numerous case studies demonstrating that orthologues 
with sex-biased function diverge more rapidly than genes with non- 
biased function’®. To determine systematically the relative contri- 
butions of gene content and expression divergence to sexual differ- 
ences, we sampled sex-biased expression within the Drosophila genus 
using species-specific microarrays designed for the closely related 
D. melanogaster, D. simulans and D. yakuba group (common 
ancestor, 10-13 million years ago), and for the more distantly related 
D. ananassae, D. pseudoobscura, D. virilis and D. mojavensis (com- 
mon ancestor, 40-65 million years ago) (Supplementary Table 1). 
The species-specific platform eliminated confounding effects of 
sequence divergence on hybridization and allowed us to assay the 
expression of lineage-restricted genes. 


Figure 1| Sex-biased expression in Drosophila species. a—g, Sex-biased 
female:male expression ratio (log») versus average expression intensity 
(log,) plots for each Drosophila species. Expression intensities are arbitrary, 
where zero represents the minimum value. Values for genes with significant 
(P = 0.01, false-discovery-rate-corrected Mann—Whitney test) female- 
biased, male-biased and non-biased expression are shown. The per cent of 
genes with female-biased or male-biased expression is inset in each panel. D. 
melanogaster, D. mel; D. simulans, D. sim; D. yakuba, D. yak; D. ananassae, 
D. ana; D. pseudoobscura, D. pse; D. virilis, D. vir; and D. mojavensis, D. moj. 


Previous work has demonstrated that sex-biased expression in 
D. melanogaster adults is substantial, primarily owing to gameto- 
genesis’®. This seems to be characteristic for the genus (Fig. 1, and 
Supplementary Fig. 1). Generally, we observed greater male-biased 
expression (~7—14% of the transcriptome) relative to female-biased 
expression (~3—9% of the transcriptome), at a significance value of 
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P=0.01 (Mann-Whitney, false-discovery-rate-corrected). The 
exceptions were D. pseudoobscura (~16% female- and male-biased 
expression) and D. mojavensis (~12% female- and male-biased 
expression). Additionally, the magnitude of male-biased expression 
was generally greater than the female-biased expression—the average 
log, female:male expression ratio was —1.2 for genes with male- 
biased expression and 0.8 for genes with female-biased expression. 
This indicates that there were more genes approaching male-specific 
expression than female-specific expression. The genes that showed 
sex-biased expression in each species are listed in Supplementary 
Information (Supplementary Tables 3-16). 

To examine expression divergence over time, we parsed the genes 
with orthologues in every species and constructed a pairwise matrix 
of log, female:male expression ratios. We compared expression 
within species (two strains of D. simulans), between species within 
the closely related melanogaster subgroup, and between all seven 
species (Fig. 2a—c). Similar pairwise matrices for quadruplicate 
replicates within each species were also plotted as a baseline measure- 
ment of technical noise and biological variability (Supplementary 
Fig. 2). All expression ratio plots were linear and showed increasing 
expression divergence with inferred genetic distance. 

There was an especially clear relationship between sequence and 
expression divergence. Neighbour-joining trees of expression diver- 
gence (from the pairwise expression ratios between each species; 
1—Pearson’s 7; Supplementary Fig. 3), or by sequence divergence”" 
have the same topology (Fig. 2d). Expression divergence tightly 
correlated with time (Fig. 2e, 7° = 0.96), which may provide a useful 
tool in molecular phylogenetics. 

Although the whole-genome trends in expression divergence were 
both obvious and clear, at the gene level, the magnitude of expression 
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divergence was modest. Only 384 orthologue pairs (0.3%) showed 
significant female-biased expression in one species and significant 
male-biased expression in another. Switches between highly female- 
biased expression and highly male-biased expression were never 
observed (Fig. 2c). Extensive (20%) categorical changes in sex-bias 
class, especially for genes with male-biased expression, were pre- 
viously reported between D. melanogaster and D. simulans'*'’*. We 
observed a categorical change in sex-biased expression in 12% of the 
orthologues between these two species, but the changes were domi- 
nated by low magnitude changes between modest sex-biased expres- 
sion and non-sex-biased categories. These values are highly sensitive 
to arbitrary significance-level cut-offs; however, it was clear in explor- 
atory plots of expression ratios that genes with male-biased expression 
showed greater expression divergence (Fig. 2b, c). Plots of expression 
ratio standard deviations against average expression ratio (Fig. 3a) 
also showed a clear excess of variable expression among orthologues 
with male-biased expression (P< 107°, chi-squared test). Thus, male- 
biased expression contributes heavily to overall expression divergence. 

To determine if particular types of genes show greater or lesser 
expression divergence we analysed Gene Ontology’* (GO) terms. 
Unsurprisingly, genes annotated as ‘unknown function’ are signifi- 
cantly over-represented (P< 107°, Fisher’s exact test) among genes 
with variable expression. Genes with ‘transcriptional regulation’ 
annotations were under-represented in the same gene set 
(P<10 “, Fisher’s exact test), suggesting that genes involved in 
transcription regulation are under constraint. Similar constrained 
expression of transcriptional regulators was observed in a study of 
metamorphosis in the melanogaster subgroup”. 

Just as changes in DNA sequence can have consequences ranging 
from deleterious to neutral to advantageous’*, changes in gene 
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Figure 2 | Expression divergence among common orthologues. 
Female:male expression ratios (log,) for orthologue pairs, plotted against 
each other: a, two different D. simulans strains; b, the melanogaster 
subgroup (D.melanogaster, D. simulans and D. yakuba); ¢, all seven 
Drosophila species. All the density (grey for high, black for low) scatter plots 
include every 1:1 pair of common orthologues for which both have an 
expression value. In b and ¢, the species A and B designation is arbitrary, 
but A is assigned to the species in the pair most closely related to 
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D. melanogaster. d, Neighbour-joining trees with branch lengths inferred 
using sequence distance (genomic mutation distance’) and the expression 
distance (1—Pearson’s r) for all pairs of species except the pairs between the 
D. ananassae outlier and other species (Supplementary Fig. 3). e, Expression 
distance values plotted against estimated divergence time'' for all possible 
species pairs and replicates within species. Quadruplicate replicates within 
each species were used at a time of 0 million years. 


©2007 Nature Publishing Group 


NATURE|Vol 450|8 November 2007 


expression should have variable effects, owing to underlying 
mutations in transcription factors, cis-regulatory sites and post- 
transcriptional regulators, and the resulting variance will be subject 
to drift and selection***'*'*"’. We were able to distinguish expres- 
sion differences between species well enough to show a linear rela- 
tionship with time at the full-transcriptome level, but does this apply 
to individual genes? 

To determine if there is a common set of orthologues that can 
tolerate variable expression (that can be thought of as the thematic 
equivalent of a synonomous codon substitution), we asked if expres- 
sion divergence between orthologues within the melanogaster sub- 
group correlates with the expression divergence between more 
distantly related species. We found no significant correlation between 
orthologue expression divergence between groups of species (7 = 
0.08, Fig. 3b). Genes with greater expression divergence in the mel- 
anogaster subgroup and the remaining species are different. Thus, 
although overall expression divergence shows a clock-like behaviour 
(reflecting mutation accumulation in a neutral model, or an adaptive 
speed limit in a selection model), different individual genes contri- 
bute to this global expression divergence in different amounts. This 
suggests that there is not a common set of genes that tolerate large 
drifts in sex-biased expression ratios. 

To analyse further the orthologues with the most divergent 
expression, we selected orthologues with the greatest expression 
divergence (s.d. > 0.5) and subjected them to cluster analysis with 
species-order fixed (Fig. 3c). Strikingly, even those genes with the 
most variable expression were organized into well-defined clusters. 
Each of the clusters was subsequently analysed to look for patterns of 
change. We observed three distinct cluster types revealing expression 
divergence between lineages, aberrant expression in a single species, 
and unpatterned variability (Fig. 3c, d). For example, cluster ‘A’ 
shows higher male-biased expression in just the melanogaster 
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Figure 3 | Expression divergence within and between species and groups. 
a, Average female:male expression ratios for common orthologues plotted 
against expression divergence (expression ratio standard deviations between 
7 species) for the same orthologues. b, Expression ratio standard deviations 
among members of the melanogaster subgroup (D. melanogaster, 

D. simulans and D. yakuba) plotted against standard deviations among the 
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subgroup (D. melanogaster, D. simulans and D. yakuba); cluster ‘B’ 
shows increased male-biased expression in D. pseudoobscura only; 
and cluster “C’ shows no evidence for a phylogenetic trend. Briefly, 
among the 5% of common orthologues with the most variable 
expression, 52% exhibited lineage-specific, 22% species-specific 
and only 25% unpatterned expression variability. 

Having only a few sequenced genomes seriously hinders the study 
of genes that are species- or lineage-specific (species-restricted). We 
took advantage of the species-specific array design to determine the 
contribution of common orthologues and species-restricted genes to 
overall sex-biased expression patterns (Fig. 4a, b). Female-biased 
expression was over-represented (P< 107’, chi-squared test) among 
common orthologues in four of the seven species, whereas male- 
biased expression was always under-represented. The pattern was 
reversed among the species-restricted genes. Female-biased expres- 
sion of species-restricted genes was less prevalent in all species except 
D. virilis, and male-biased expression was more prevalent in each of 
the species examined. Female-biased expression was also under- 
represented among paralogues (Supplementary Fig. 4). Similar 
results were obtained using TBLASTN methods to detect genes that 
had diverged to obscure orthology (Supplementary Fig. 5). These 
suggest that genes with male-biased expression have higher effective 
birth and extinction rates. 

We also asked if sex-bias and expression divergence correlate with 
sequence divergence among orthologues. If similar selective pressure 
acts on both protein-coding capacity and expression at a given locus, 
then they should correlate. However, protein-coding capacity and 
expression divergence need not be tightly coupled. For example, high 
expression divergence can result from changes in upstream transcrip- 
tion factors or the cis-regulatory sites that they bind”. 

Synonymous (Kg) and non-synonymous substitution rates (K,) in 
protein-coding genes were used to examine sequence divergence”. 
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other four species (D. ananassae, D. pseudoobscura, D. virilis and 

D. mojavensis). ¢, K-means clustering (K = 10, species-order fixed) of 
expression ratios where s.d. > 0.5. Female-biased (red), male-biased (blue) 
and non-biased (black) expression is indicated. d, Examples of gene clusters 
that are indicated on the Eisengram (c). Species (x axis) and log, female:male 
expression ratio (y axis) of common orthologues are shown. 
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Multiple substitutions occur at a given site between distantly related 
species (for example, D. melanogaster and D. mojavensis) making 
Ka/Kg ratios much less reliable, and therefore K,/Ks ratios were used 
only within the melanogaster subgroup (Fig. 4c, d). Genes with male- 
biased expression were expected to show higher Ka/Kg ratios’® 
Indeed, common orthologues with male-biased expression had 
K,/Kg values within the melanogaster subgroup (0.129), more than 
two times those of common orthologues with female-biased expres- 
sion (0.061). Interestingly, common orthologues with non-biased 
expression showed intermediate K,/Kg values. We observed a strong 
correlation between expression and sequence divergence among 
the genes showing the greatest expression divergence (Fig. 4c), as 
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Figure 4 | Relationship between sex-biased expression, gene content and 
sequence divergence. Gene content and expression of common orthologues 
(a) and species-restricted (b) genes. The percentages of all genes (black) and 
with female (red) or male (blue) -biased expression are shown. Significant 
differences (P< 10-2, chi-squared test) between sex-biased classes and total 
genes are indicated (asterisks). See Supplementary Fig. 5 for paralogues. 
Average K,/Kg ratios within the melanogaster subgroup for common 
orthologues with high or low expression-ratio s.d. (¢) and for all common 
orthologues or species-restricted genes (d). Bars are colour-coded as in 

a, with the addition of grey bars, which represent non-biased expression. 
Significant differences (P< 10 7, Mann—Whitney test) between common 
orthologues with constrained expression and variable expression, or 
between common orthologues and species-restricted genes are indicated 
(asterisks). 
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has also been seen in mammals’. Additionally, species-restricted 
genes had higher sequence-divergence than common orthologues 
for all expression categories (Fig. 4d), as has been seen in verte- 
brates’. Perhaps expression divergence, gene turn-over, sex-bias 
and sequence divergence of individual genes are often coupled to 
the same selective forces. 

The contrasting divergence and turnover patterns of genes with 
male-biased expression relative to those with female-biased expres- 
sion is somewhat surprising. Reproduction is the function of a cou- 
ple, not an individual; therefore co-evolution of reproductive traits is 
expected to occur. For example, selection for sperm tail length in 
Drosophila males is coupled to selection for length of the seminal 
receptacle in females”*. There are a number of possible explanations. 
There may be greater de novo generation of genes with male-biased 
expression as a result of simple sequence requirements for core pro- 
moter generation” and extremely high levels of RNA polymerase in 
spermatocytes”. This combination might result in excessive tran- 
scription of intragenic regions**. A few of these new genes with 
male-biased expression might be functional, but most of these 
‘de novo’ genes would be expected to rapidly degenerate. Alter- 
natively, genes required for oogenesis may be more constrained 
because of pleiotropy or the under-representation of paralogues 
with partially overlapping functions. Many D. melanogaster genes 
required for female fertility are also required for organismal 
viability’, and genes with clear multiple functions, such as those 
encoding ribosomal proteins, are overexpressed in ovaries relative 
to testes**. Finally, male-male competition might be particularly 
strong”. The addition of more sequenced genomes will provide 
ample opportunities to explore these questions further. 


METHODS SUMMARY 

Flies. Species were grown on standard media (Tucson Drosophila Stock Center). 
We isolated messenger RNA from adult females and males grown at 22 °C (5-7 
days post eclosion), and labelled and hybridized using standard methods. 
Arrays. Oligonucleotide arrays of 50-mers (NimbleGen Systems) were designed 
against draft assemblies and ab initio annotations we contributed to the 
gene model reconciliation’ (Supplementary Table 2). D. melanogaster 60-mer 
expression array (NimbleGen Design ID 2005-10-17_Dmel4_60mer_exp) 
designed on the basis of Flybase annotation V4.2 was used for D. melanogaster 
hybridizations. For this report, we remapped all array elements to current con- 
sensus gene models’. Our expression results and conclusions were similar using 
the original models. Because low-magnitude expression divergence is difficult to 
distinguish from noise, we performed at least quadruplicate replicates for each 
species and only channels passing a stringent quality control regimen were used 
in the final analysis (72 channels total). Full platform descriptions and data are 
available at the GEO under accession GSE6640. 

For each gene, log, intensities for female and male expression were compared 
by non-parametric two-sample Mann—Whitney tests to generate the significance 
of sex-biased expression (P = 0.01) and ratios of each gene were calculated as the 
average probeset intensity in female channels divided by the intensity in male 
channels. Common orthologues are present in all 7 species and species-restricted 
genes are present in at least one species, but absent in =1 species. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Flies. Stocks (Tucson Drosophila Stock Center) of D. simulans (14021-0251.198 
and 14021-0251.011), D. yakuba (14021-0261.01), D. ananassae (14024- 
0371.13), D. pseudoobscura (14011-0121.94), D. mojavensis (15081-1352.22) 
and D. virilis (15010-1051.87) were grown on standard cornmeal media 
(Tucson Drosophila Stock Center), with the exception of D. pseudoobscura, D. 
mojavensis and D. virilis, which were grown on Banana-Opuntia media (Tucson 
Drosophila Stock Center). 

Arrays. Oligonucleotide 50-mer arrays (NimbleGen Systems) were designed on 
the basis of draft or versioned genomic assemblies of six Drosophila species 
(D. simulans assembly: PCAP assembly for white501, GSC, Wash U, 01 
December 04; D. yakuba assembly: GSC (WashU), 07 April 2004. D. ananassae 
assembly: Arachne Assembler, Agencourt, 06 December 2004; D. pseudoobscura 
assembly: v. 1.03 from FlyBase, December 2004; D. mojavensis assembly: Arachne 
Assembler, Agencourt, 06 December 2004; D. virilis assembly: Arachne 
Assembler, Agencourt, 29 October 2004). An average of 10 array probes were 
selected without bias with respect to position within each of our OLIV gene 
models. The OLIV set includes both high- and low-confidence models, which 
included non-overlapping draft EIS gene models based on D.melanogaster 
orthology (M. Eisen laboratory, v1.0, Feb 2005), ab initio GeneID” predictions 
using the D. melanogaster training set, FlyBase*! genes and expressed sequence 
tag sequence from GenBank”. Array probes were remapped to the final genome 
assemblies and gene predictions GLEANR” by BLAT V25x1 (ref. 33). Only 
probes uniquely and perfectly matched to both annotation and assembly were 
used for final analysis. 

Hybridization was according to the manufacturer’s instructions (NimbleGen 
Systems), except that hybridization was done in custom-made chambers. Arrays 
were scanned on an Axon GenePix 4000B (Molecular Devices Corporation) and 
data were captured using NimbleScan 2.1 (NimbleGen Systems). For each spe- 
cies, at least four hybridizations, including technical (dye-flipped) replicates for 
each of two discrete samples (biological replicates) were performed. Extra hybri- 
dizations were performed for a different D. simulans stain (14021-0251.011). 
Data handling. We used a multi-step quality control pipeline. Hybridization 
channels were retained when experimental intensities were >1 s.d. above mean 
on-spot background (from non-Drosophila control elements) and the inter- 
quartile ranges of log intensities were >1. Passed channels were normalized 
using variance stabilization normalization™. Signal variability between replicate 
channels was then tested by calculating the inter-quartile range of the relative log 
expression values for each channel against a virtual reference (the median value 
in all replicate channels for each array element). The channels with inter-quartile 
ranges of the relative log expression values greater than one were rejected. Passed 
channels were re-normalized by variance stabilization normalization from the 
raw data. This approach does not over-normalize the data while assuring that 
hybridization intensity is consistent between replicate channels. 

For each gene, log, intensities from female-sample single channels and 
male-sample single channels were compared by non-parametric two-sample 
Mann-Whitney tests to generate a significance measure. Ratios were then 
calculated as the average probeset intensity in female channels divided by the 
average probeset intensity in male channels for each gene. P-values were false- 
discovery-rate-corrected**. The cut-off for sex-biased expression we used was 
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false-discovery-rate-adjusted P=0.01. Expression was called ‘below back- 
ground’ when the average probeset intensity was less than the average intensity 
of negative controls (probes targeting four Arabidopsis genes and two yeast 
genes) in both sexes. 

Orthology calls are as described’. Common orthologues represent orthologues 
present in all 7 species and species-restricted genes are present in one species, but 
absent in =1 species. Paralogues are excluded from most of the data analysis. 
Multiple sequence alignments of orthologues were imported using the seqinR 
package*®. K,/Kg estimates adjusted for differences in transition and transversion 
rates were calculated from these alignments”. Average Ka/Ks of common 
orthologues were calculated from all possible K,/Ks values between the mela- 
nogaster subgroup, then median K,/Ksg values were calculated for each category 
of genes with different sex-biased expression and genes with different expression 
divergence. 

DNA sequence divergence and expression divergence (1—Pearson’s r between 
the sex-biased expression ratio of two species) between each species pair was 
calculated. DNA sequence divergence was presented using genomic mutation 
distance by the method described previously''. Neighbour-joining trees were 
then inferred using DNA sequence divergence and expression distance from 
six species separately in MEGA4 (ref. 37). The common orthologues with most 
variable expression among species (s.d. > 0.5) were K-means clustered with 10 
nodes using the euclidean similarity metric in Cluster 3.0/Tree-View™. 

D. melanogaster v.4.3 and D. pseudoobscura v.2.0 sequence and annotation 
from FlyBase were used as queries against the final genome assemblies” of all 
other six species by TBLASTN”’. 
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Demasculinization of X chromosomes in the 


Drosophila genus 


David Sturgill’*, Yu Zhang'*, Michael Parisi’+ & Brian Oliver’ 


X chromosomes evolve differently from autosomes, but general 
governing principles have not emerged’. For example, genes 
with male-biased expression are under-represented on the 
X chromosome of D. melanogaster’, but are randomly distributed 
in the genome of Anopheles gambiae’. In direct global profiling 
experiments using species-specific microarrays, we find a nearly 
identical paucity of genes with male-biased expression on D. mel- 
anogaster, D. simulans, D. yakuba, D. ananassae, D. virilis and D. 
mojavensis X chromosomes. We observe the same under-repres- 
entation on the neo-X of D. pseudoobscura. It has been suggested 
that precocious meiotic silencing of the X chromosome accounts 
for reduced X chromosome male-biased expression in nematodes’, 
mammals°* and Drosophila’. We show that X chromosome genes 
with male-biased expression are under-represented in somatic 
cells and in mitotic male germ cells. These data are incompatible 
with simple X chromosome inactivation models. Using expression 
profiling and comparative sequence analysis, we show that selec- 
tive gene extinction on the X chromosome, creation of new genes 
on autosomes and changed genomic location of existing genes 
contribute to the unusual X chromosome gene content. 

Several models have been advanced to explain the peculiar gene 
content on Xchromosomes, which can be divided into those 
driven by gene-by-gene or chromosome-wide selective pressures’”’. 
Antagonistic selection is a popular gene-by-gene model. Females and 
males are under different selective pressures and deploy the genome 
differently such that genes and expression states advantageous for 
one sex can be disadvantageous to the other. This is expected to have 
a profound influence on X chromosomes because hemizygosity and 
immediate selection of recessive alleles in males should be masculi- 
nizing, whereas the increased residency time of X chromosomes in 
females and immediate selection of dominant alleles should be a 
counteracting force. A popular chromosome-wide model suggests 
that X chromosome inactivation during spermatogenesis is respons- 
ible for the reduced number of Xchromosome genes with male- 
biased expression. The X chromosome is precociously condensed, 
and thus silenced, in preparation for male meiosis, owing to the 
absence of a homologous pairing partner. 

Before addressing particular models, we asked if X chromosome 
sex-biased expression patterns were consistent across the genus. 
We determined female:male expression ratios on species-specific 
microarrays®, normalized sex bias across species, and parsed the 
expression data by chromosome arm (Fig. la, and Supplementary 
Fig. 1). Homologous linkage groups in the Drosophila genus are 
referred to as ‘Muller’s elements’ to standardize discordant species- 
specific chromosome nomenclature’. Muller A is part of the 
X chromosome in all the species examined. Muller A genes with 
male-biased expression were under-represented relative to auto- 
somes in each of the seven species (30-43% less than expected, 


P<10 * by chi-squared test; Supplementary Table 1). No other 
chromosome arms showed a genus-wide significant departure 
from a random distribution, although we did observe a modest 
over-representation of genes with male-biased expression on 
Muller B (P< 10° * by chi-squared test) in D. melanogaster, D. simu- 
lans, D. ananassae and D. mojavensis. Therefore, a dearth of genes 
with male-biased expression on the X chromosome is characteristic 
of the genus. 

Formally, reduced male-biased expression could be due to an his- 
torical accident on the Muller A element rather than a property of the 
X chromosome itself. This is directly testable owing to the chro- 
mosome arrangement in D. pseudoobscura, in which the ancestral 
autosome, Muller D, fused to the X chromosome ~8-12 million 
years ago'’. The neo-X chromosome (Fig. 1a) also showed a strong 
under-representation of male-biased expression (37% of the 
expected random distribution, P< 10~* by chi-squared test). The 
consistent under-representation of genes with male-biased expres- 
sion on the D. pseudoobscura neo-X and on the ancestral 
X chromosomes, suggests that the effect of linkage on male-biased 
expression is a property of X chromosomes and that it takes less than 
12 million years to reach a stereotypical depleted status. 

Important predictions of the X chromosome inactivation model 
for reduced male-biased expression are readily testable: Expression 
of all X chromosome genes should be reduced, and X chromosome 
deficits in male-biased expression should be restricted to late sper- 
matocytes. There was no global reduction in X chromosome gene 
expression in males (Fig. 1b), suggesting that global inactivation in 
spermatocytes does not disrupt the over-all balance of gene expres- 
sion at the organismal level. These data are also consistent with 
genus-wide high-fidelity X chromosome dosage compensation in 
both the male soma and germline as observed in D. melanogaster'’. 
The paucity of Xchromosome genes with male-biased expression 
seems due to reduced numbers of genes with overt male-biased 
expression, not a chromosome-wide effect. Although these data do 
not support the X chromosome inactivation model, they are also 
based on whole-animal expression profiles that greatly dilute late 
spermatocyte expression. 

If male-biased expression from X chromosomes is reduced owing 
to a germline-specific event, then there should be no similar reduc- 
tion in the soma. Most orthologues show sex-biased expression in all 
tested species®. Therefore, we estimated somatic sex-biased expres- 
sion by computationally removing orthologues with testis-biased 
expression in D. melanogaster (1,569 genes). We observed signi- 
ficant under-representation (P<10~* by chi-squared test) of 
X chromosome genes with putative somatic male-biased expression 
in five of the seven species (Fig. 1c). This included the D. pseudoobs- 
cura neo-X chromosome. We also re-analysed non-gonadal D. mel- 
anogaster expression data (Fig. 2a)’. Whereas the number of genes 
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Figure 1| Genes that show sex-biased expression, on chromosome arms. 
a, Percentage of genes with female-biased (red), non-biased (grey) and male- 
biased (blue) expression on chromosome arms. Muller’s elements and arms 
are indicated (X chromosomes in bold). Significant under-representation 
from a random distribution of genes in an expression class (chi-squared test) 
is noted (*P<10 *). The top 20% of differentially expressed genes (by 
ranked P values) are assigned a sex-biased expression class. b, Box plots of 
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Figure 2 | Male-biased expression not subject to X chromosome 
inactivation. a—c, Percentage of D. melanogaster genes with male-biased 
expression. Significant under-representation as in Fig. 1 is noted (*P<10 *; 
*P<10 *).a, Non-gonadal soma (gonadectomized carcasses)’. b, ¢, Testis 
from mutants with mitotically active germline tumours’*. Male-biased 
expression was determined by present calls in wild-type and mutant testis 
and absent calls in wild-type ovary expression profiles’* from FlyMine”’. 
Testis from bgcn males (primary spermatocyte-biased; b) accumulates 
mitotically active cysts of interconnected germ cells, whereas testis from 
males overexpressing os (nos-os) in a bgcn background (male germline 
stem-cell-biased; ¢) accumulates mitotically active germ cells that undergo 
complete cytokinesis’*. d-f, Box plots of average intensities (arbitrary scale) 
across hybridizations for wild-type (d), bgcn  (e), or nos-os; bgcn  (f) testis 
separated by chromosome arm. Twenty-fifth to seventy-fifth percentiles 
(boxes), medians (lines in boxes) and ranges (whiskers) are indicated. 
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average hybridization intensities for males by chromosome arm. Twenty- 
fifth to seventy fifth percentiles (boxes), medians (lines in boxes) and 
ranges (whiskers) are indicated. c, As in a, but with genes with predicted 
testis-biased expression removed. D. melanogaster, D. mel; D. simulans, 
D. sim, D. yakuba, D. yak; D. ananassae, D. ana; D. pseudoobscura, D. pse; 
D. virilis, D. vir; D. mojavensis, D. moj. 


with male-biased expression is much lower in the non-gonadal soma, 
we observed clear under-representation of X chromosome genes with 
male-biased expression (P< 107°), as previously observed’. 

To test the Xchromosome inactivation model more directly, 
we used testis expression data from D. melanogaster mutants 
blocked during primary spermatocyte amplification'*. Testes from 
these mutants accumulate vast numbers of mitotic germ cells. We 
used independently profiled ovary samples as a reference’’. 
Xchromosome genes with testis-biased expression were under- 
represented in these mutant profiles (P<10 * by chi-squared 
test, Fig. 2b, c). Additionally, there was no evidence for global 
reduction in the expression of all Xchromosome genes in either 
wild-type or mutant testis (Fig. 2d—-f). These data suggest that the 
X chromosome is a poor location for germline male-biased expres- 
sion before X chromosome inactivation, which occurs in late post- 
mitotic spermatocytes®. X chromosome inactivation may be too late 
to affect most X chromosome transcript levels. We suggest that the 
paucity of Xchromosome genes with male-biased expression in 
Drosophila is due to selection at the gene level, not global chromatin 
status. 

Regardless of the thematic model, there are a limited number 
of physical mechanisms for achieving the observed under- 
representation of Xchromosome genes with male-biased expres- 
sion”'*. We explored whether X chromosome genes with male-biased 
expression convert expression class, preferentially move to auto- 
somes, are preferentially lost, or fail to arise on X chromosomes, 
using the D. pseudoobscura neo-X chromosome as a well-controlled 
natural experiment. We inferred the expression pattern of the 
ancestral Muller D of D. pseudoobscura before translocation to the 
X chromosome by using expression data from six phylogenetically 
flanking species. The extant expression pattern was directly 
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Figure 3 | Theneo-X chromosome. a, D. pseudoobscura neo-X chromosome 
formation cartoon. Muller A, D and E are attached to different centromeres 
(filled circles) in D. pseudoobscura relative to flanking species. b, Bar plot 
showing changes to D. pseudoobscura Muller D neo-X chromosome 
(orange) and Muller E autosomal arm (green), based on the predicted 
expression pattern of the ancestral autosomes. Categories are: switches in 
sex-biased expression (F, female-biased; M, male-biased), not found 


determined. In the other species, Muller D and Muller E are arms 
of the same chromosome or are individual chromosomes. Muller E 
provided a reference for the analysis (Fig. 3a, Supplementary Tables 
3, 4, and Supplementary Fig. 2). 

Of the 242 Muller D genes with ancestral or extant male-biased 
expression (Fig. 3b) that can be unambiguously assigned linkage 
and expression patterns, 216 (89%) remain in place on the neo- 
X chromosome. Only one gene on the neo-X chromosome switched 
from male-biased expression to female-biased expression. Although 
this particular gene monkey king (mkg-p) has undergone an 
interesting functional radiation, and sex-biased expression switches 
in the melanogaster subgroup’’, sex-bias change on the neo- 
Xchromosome is not significantly different from Muller E (no 
switches). Thus, there is no overt evidence that expression changes 
are responsible for under-representation of genes with male-biased 
expression on X chromosomes. 

We found that 5% of genes with male-biased expression on the 
ancestral autosome had been lost from the genome of D. pseudoobs- 
cura, whereas only 1% of ancestral genes with male-biased expression 
were lost from Muller E (P< 0.05 by chi-squared test). Only 2% of 
genes with male-biased expression on the neo-X chromosome were 
unique to D. pseudoobscura, whereas 15% of the Muller E genes with 
male-biased expression in D. pseudoobscura are unique. The relative 
excess of new genes with male-biased expression on the autosome is 
highly significant (P< 10~* by chi-squared test). Determining if a 
gene is absent or new rather than highly diverged is difficult. 
However, exploration of genus-wide gene content using relaxed 
alignment instead of the consensus gene models indicates that many 
of these genes are likely to be completely absent rather than highly 
diverged (Supplementary Fig. 3). We also found clear evidence that 
non-synonymous substitutions (K,) relative to synonymous substi- 
tution (Kg) rates are higher for X chromosome genes with male- 
biased expression than for those on autosomes (Supplementary 
Fig. 4). These data suggest that there is net loss of genes with male- 
biased expression from the X chromosome (as well as more subtle 
changes to sequence) and that functions lost from the X chromosome 
may be replaced by new gene formation on autosomes. 

Analysis of gene transposition showed a net movement of genes 
with male-biased expression off the neo-X chromosome, with 3% of 
ancestral genes with male-biased expression moving to autosomes 
and no genes with male-biased expression moving from Muller E 
(P< 10 * by chi-squared test). Male-biased expression was always 
maintained in the new location. There was no significant difference 
between the neo-X and Muller E with respect to newly arriving genes, 
because 1% of ancestral genes with male-biased expression moved to 
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(absent), only found in D. pseudoobscura (new), and genes moving from 
(move off) and to (move on) the indicated arm. Significant differences 
(chi-squared test) between Muller D and Muller E are noted (**P < 10 4; 
*P <0.05). The ancestral condition of the neo-X chromosome was inferred 
by examining chromosome linkage and expression on homologous arms in 
species that diverged from the Drosophila lineage before and after 

D. pseudoobscura. 


the neo-X chromosome and 2% moved to Muller E. These data 
clearly support the idea that movement of genes with male-biased 
expression to autosomes promotes long-term survival of those genes. 
Gene loss, gain and movement unambiguously accounted for 13% 
of the loss of male-biased expression from the neo-X chromosome 
and 73% of the gain of male-biased expression on Muller E 
(Supplementary Table 5). If these unambiguous results reflect the 
changes that we are unable to trace, then gene loss, gain and move- 
ment are the dominant mechanisms for depleting male-biased 
expression on the X chromosome relative to autosomes. 

Our data strongly support the idea that the X chromosome 
has an unusual distribution of genes with male-biased expres- 
sion**'*'*, Although there is compelling evidence that precocious 
X chromosome inactivation occurs in at least some species, and that 
this may contribute to the reduced density of genes with male-biased 
expression on X chromosomes*”, our data indicate that this is not 
a major contributor to the pattern in the Drosophila genus. We 
suggest that the dominant mechanisms for achieving this under- 
representation are preferential extinction of X chromosome genes 
and formation of new autosomal genes with male-biased expression’, 
along with movement of genes with male-biased expression from the 
Xchromosome™. Somewhat surprisingly, altered gene expression 
does not seem to be a major contributor. 


METHODS SUMMARY 
Array data sources. Expression data for sex-sorted whole adults of the seven 
Drosophila species*® are from GEO” (GSE6640). Data for D. melanogaster gona- 
dectomized male and female carcass on the FlyGEM platform were published 
previously’, GEO (GSE442). Affymetrix data for bgen” and UAS-os, bgcn /nos- 
Gal4-VP16, bgcn mutant testes expression'* were obtained from GEO 
(GSE4188,GD$2228) and wild-type ovary data were obtained from FlyMine” 
release v.8.0. 
Sequence analysis. D. melanogaster annotation v.4.3 (ref. 21) and Comparative 
Analysis Freeze 1 (CAF1) assemblies” were used throughout. Orthology rela- 
tionships between the seven species including orthologue, paralogue and no_ 
homologue were assigned to consensus gene predictions. Gene content changes 
were also determined by comparing D. melanogaster amino acid sequence 
against six-frame translated genomic DNA of each species by BLAST tblastn”’. 
For gene loss and translocation, the ancestral state of the lost/moved gene was 
inferred by the consensus expression class and chromosome linkage for each 
gene. For the D. pseudoobscura neo-X chromosome (Muller D) and autosomal 
Muller E, gene gain/loss/movement was manually counted gene-by-gene. The 
ancestral gene content of this arm was inferred by phylogeny using genes present 
in species that diverged before (D. virilis and D. mojavensis) and after (D. ana- 
nassae and the melanogaster subgroup) the melanogaster/obscura group split. 
The ancestral state refers to the chromosome linkage and expression class of 
genes at the origin of this node in a rooted phylogenetic tree. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Expression data for whole adults of the seven Drosophila species were generated 
with custom-designed oligonucleotide arrays (NimbleGen Systems). Array 
design, sample preparation and labelling, and data handling were as described’. 
Data for D. melanogaster gonadectomized male and female carcass” were re- 
analysed from raw intensity; these data were re-normalized by within-slide 
print-tip loess normalization, followed by between-slide quantile normalization 
using Bioconductor™. Differential expression was then called in the same man- 
ner as the custom Nimblegen arrays, using Mann—Whitney tests with false- 
discovery rate correction. For Affymetrix data, we used ‘present’ and ‘absent’ 
calls to determine sex-bias. Intensity data were used directly from GEO without 
additional normalization, and averaged over replicates. Data were remapped to 
match DrosGenome2 to DrosGenomel identifiers by Affymetrix. 

“Gained genes’ were defined as genes present in only one species 
(Supplementary Fig. 2). “Lost genes’ are genes absent from only one. ‘Moved 
genes’ are those that are present in all seven species, but with different chro- 
mosome linkage in at least one species. For gene loss and translocation, the 
ancestral state of the lost/moved gene is inferred by the consensus expression 
class and chromosome linkage for each gene. Genes with different arm linkage, 
owing to a large pericentric inversion in the D. yakuba lineage*', were not 
counted as translocations. 

We also calculated content changes independent of gene prediction in the 
newly sequenced species. We performed a comparison of D. melanogaster amino 
acid sequence against six-frame translated genomic DNA of each species via 
BLAST tblastn®. BLAST results were parsed and hits tiled together using 
BioPerl”’. D. melanogaster genes were called ‘absent’ in the other species if 
there was no hit below an E-value cut-off (10-° and 10 '” are shown in 
Supplementary Fig. 3). 

For the D. pseudoobscura neo-X chromosome (Muller D) and autosomal 
Muller E, gene gain/loss/movement was manually counted gene-by-gene. The 
ancestral gene content of this arm was inferred by phylogeny, using genes present 
in species that diverged before (D. virilis and D. mojavensis) and after (D. ana- 
nassae and the melanogaster subgroup) the melanogaster/obscura group split. 
The ancestral state refers to the chromosome linkage and expression class of 
genes at the origin of this node in a rooted phylogenetic tree*'”*. For both Muller 
Dand E, we filtered genes that are linked to these arms in any species. These were 
then filtered to include those that are Muller D- or Muller E- linked in at least one 
species downstream and one species upstream, or in D. pseudoobscura itself. 
Then genes with male-biased expression in at least one species are filtered. 
Each remaining gene was then manually assigned a putative ancestral expression 
class and arm location and post neo-X chromosome fate that considered the 
entire Drosophila lineage. There are cases involving species on the ‘edge’ of the 
tree that could not be resolved by this approach. For example, a gene that is 
Muller B-linked in D.mojavensis/D.virilis (on the edge of the rooted tree), but 
Muller D-linked in every other species would not be counted, because it is not 
possible to discriminate the direction of translocation. Our approach includes 
only unambiguous cases (Supplementary Tables 2-4). 

Multiple sequence alignments of orthologues were imported using the seqinR 
package’®. K,/Kg estimates adjusted for differences in transition and transversion 
rates were calculated from these alignments”. 

All computation was performed in the R/Bioconductor environment™. 
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Maternal nodal and zebrafish embryogenesis 


Arising from: A. V. Gore et al. Nature 438, 1030-1035 (2005). 


In fish and amphibians, the dorsal axis is specified by the asymmetric 
localization of maternally provided components of the Wnt signal- 
ling pathway’”. Gore et al.’ suggest that the Nodal signal Squint (Sqt) 
is required as a maternally provided dorsal determinant in zebrafish. 
Here we test their proposal and show that the maternal activities of 
sqt and the related Nodal gene cyclops (cyc) are not required for 
dorsoventral patterning. 

Sqt and Cyc induce mesoderm and endoderm™. Embryos without 
zygotic sqt and cyc (Zcyc;Zsqt) lack all endoderm and most meso- 
derm’. Gore et al.’ suggest that maternal Sqt might also act as a dorsal 
determinant, because injection of antisense sqt morpholinos into 
unfertilized eggs induced dorsal defects’. However, we point out 
two potential caveats in their study. First, Gore et al.* do not provide 
evidence that their approach eliminates maternal sqt. In particular, 
two morpholinos were designed to prevent sqt pre-messenger RNA 
splicing, even though there is little, if any, evidence that maternal 
pre-mRNAs are deposited in the egg and spliced after oviposition. 
Second, regardless of its effect on maternal sqt, the morpholino 
approach of Gore et al. also blocks zygotic sqt activity. Hence, these 
experiments did not specifically test the requirement for maternal sqt. 

We first determined whether fully spliced sqt mRNA is present in 
ovaries and unfertilized eggs. Polymerase chain reaction with reverse 
transcription (RT-PCR) detected abundant spliced sqt mRNA, con- 
sistent with its cytoplasmic localization® (Fig. 1a). Because we per- 
formed the reverse transcription with not only oligo(dT) but also 
gene-specific primers, our analysis would have detected unspliced sqt 
RNA regardless of its polyadenylation state. The presence of spliced 
sqt MRNA in early embryos demonstrates that splice-blocking mor- 
pholinos cannot fully block maternal sqt activity, if they have any 
effect at all. 

To test conclusively the requirement for maternal sqt, one needs 
to generate embryos from sqt homozygous mutant mothers and 
wild-type fathers. We investigated whether the sqrt allele lacks 
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Figure 1| Splicing and disruption of sqt RNA. a, Primers flanking the first 
intron of the sqt gene detected only spliced RNA (337 bp spliced versus 
968 bp unspliced) in complementary DNA prepared from total RNA of 
unfertilized eggs. Gene-specific primers (lanes 4 and 5) and oligo(dT) 
primers (lanes 1 and 2) were used, so that non-polyadenylated, unspliced sqt 
RNA could be detected. RT, reverse transcriptase. b, sqt mRNA is detected in 
wild-type (WT) embryos (lane 1) but not MZsqt”*° mutants (M) (lane 2) at 
the 8-cell stage. c, Schematic of Sqt protein, showing signal sequence (SS), 
prodomain (white), and mature ligand domain (red). The sqt’” insertion, 
and the N-terminally truncated putative product of sqt*° (T-Sqt) are 
indicated. d-f, Embryos injected with 50 pg of wild-type sqt mRNA have 
ectopic gsc expression (f), but injection of 150 pg of T-Sqt (e) has no effect. 
Additional supporting data and details of methods are available from the 
authors at http://www.mcb.harvard.edu/Schier/BennettBCASuppl07.pdf. 


Sqt activity. Sequencing of genomic DNA confirmed that sqt”*? con- 
tains a 1,848-base-pair (bp) insertion (Fig. 1c)°. Homozygous sqt”*” 
embryos contained no detectable sqt mRNA at the 8-cell stage 
(Fig. 1b). In the late blastula, all sqt RNA detectable in mutants con- 
tained the 1.8-kilobase insertion, which introduces numerous stop 
codons in all reading frames (data not shown). Hence, sqft”? 
mutants have strongly reduced levels of sqt mRNA, and the mutant 
allele cannot generate wild-type Sqt protein. The presence of an in- 
frame AUG codon 3’ to the insertion indicated that sqt”*? might 
produce an amino-terminally truncated Sqt protein (T-Sqt), which 
would lack a signal sequence. This would require the ribosome to 
bypass 41 upstream AUGs or the initiation of transcription within 
the insertion itself (Fig. 1c). Although it is unlikely that T-Sqt is 
produced, we tested its activity. Injection of mRNA encoding T-Sqt 
did not induce phenotypic abnormalities (Fig. 1d—f). Hence, all avail- 
able evidence indicates that sqt”*’ completely eliminates Sqt activity, 
whereas the splice-site morpholinos’ cannot eliminate the function 
of spliced maternal sqt mRNA. 

Maternal-zygotic and zygotic sqt”*° and sqt"””” (a retroviral inser- 
tional allele) mutants have similar phenotypes’*. To investigate 


a requirement for maternal Sqt, we generated embryos lacking 


Wild type 


c 
McycMsqt 4 > YS SA 
e 
MZcycMZsqt 
9g h 


Figure 2 | The maternal Nodal genes cyclops and squint are not required for 
dorsal axis specification. a, c, e, g, Lateral views of live embryos 28 h post- 
fertilization. Genotypes are indicated on the left. Msqt and Mcyc;Msqt 
embryos appear phenotypically normal. The hatching gland, an anterior 
dorsal mesoderm derivative, is marked by red arrows. MZcyc;MZsqt 
embryos lack endoderm and head and trunk mesoderm but retain anterior 
neuroectoderm, including a cyclopic eye (black arrow), similar to Zcyc;Zsqt” 
and MZoep”* mutants. b, d, f, h, Expression of hgg1, a marker for anterior 
dorsal mesoderm, detected by whole-mount in situ hybridization in 10- 
somite-stage embryos (lateral view). We used cyc””"4, a mutation thought to 
eliminate all Cyc activity'’. Additional supporting data and details of 
methods are available from the authors at http://www.mcb.harvard.edu/ 
Schier/BennettBCASuppl07.pdf. 
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maternal but not zygotic sqt, by crossing sqt homozygous mutant 
females and wild-type males. Maternal sqt mutants (Msqt) were 
viable and phenotypically normal (Fig. 2c). Analysis of markers 
expressed in axial, paraxial, intermediate, and lateral mesoderm 
did not detect any defects in Msqt embryos (Fig. 2d). 

To determine whether maternal sqt acts redundantly with mater- 
nal cyc, we analysed Mcyc;Msqt embryos. Mcyc;Msqt embryos gener- 
ated by germline replacement’ were viable and phenotypically 
normal (Fig. 2e). Hence, complete elimination of maternal Nodal 
signals does not affect zebrafish embryogenesis. To eliminate all early 
Nodal activity, we generated MZcycMZsqt embryos. These mutants 
developed dorsal derivatives such as the anterior neuroectoderm and 
appeared identical to Zcyc;Zsqt embryos and to MZoep'® mutants, 
which lack the Nodal co-receptor (Fig. 2g). 

These results cannot exclude potential contributions by maternal 
cyc or sqt under very particular genetic or environmental condi- 
tions*’’, but we have shown that maternal cycand sqt are not required 
for dorsal axis specification or for any other aspect of embryogenesis. 
We propose that Nodal signals act primarily as zygotic inducers of 
mesendoderm. 
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Gore et al. reply 


Replying to: J. T. Bennett et al. Nature 450, doi: 10.1038/nature06314 (2007). 


We presented several lines of evidence indicating that dorso—ventral 
asymmetry is apparent in zebrafish embryos by cleavage stages’, one 
of which showed that injection of three different morpholino oligo- 
nucleotides targeting three different squint (sqt) sequences cause 
severe disruption in dorsal structures. We concluded that the dorsal 
axis is evident by the 4-cell stage, and suggested that maternal Sqt and 
associated factors may function in zebrafish dorsal-axis formation’. 
Bennett et al. challenge our results obtained with morpholino oligo- 
nucleotides because they do not find a comparable defect in maternal 
and zygotic sqt mutants’. 

How could these differences be explained? Antisense RNAs target- 
ing short sequences can have off-target effects. One of the three 
morpholinos we used' was a previously described sqt ATG morpho- 
lino’, whereas the other two were directed against splice junctions. 
Eggs or embryos injected with control morpholinos did not manifest 
the same phenotypes. All three sqt morpholinos, when injected into 
fertilized embryos, recapitulated the milder-mutant phenotypes*”, 
which argues against off-target effects. Although all three morpholi- 
nos might have off-target effects that produce the same phenotype by 
chance, this explanation is unlikely. 

Bennett et al. also raise a concern related to our sqt splice-junction 
morpholinos. They contend that there is little evidence for maternal 
pre-messenger RNAs in eggs, and therefore that they cannot be tar- 
geted by splice-junction morpholinos. However, for several maternal 
transcripts in Xenopus, a pool of unprocessed RNA is present in eggs 
and early embryos**. Bennett et al. use primers spanning one of the 


sqt introns and do not detect unspliced Sqt RNA in ovaries, early 
embryos, or at the peak of zygotic sqt expression*®''. In further 
control experiments (Fig. 1), we consistently detect unspliced and 
spliced sqt RNA in zebrafish ovaries, eggs and early embryos, using 
primers for both sqt introns (ref. 12, and Fig. la—c). Furthermore, we 
detect aberrantly spliced and unspliced sqt RNA at the 8-cell stage 
(before zygotic sqt expression) on injection of sqt splice-junction 
morpholinos, but not of control morpholinos (Fig. 1c). Therefore, 
our use of splice-junction morpholinos to block maternal sqt func- 
tion cannot be excluded as a valid approach. 

It is also possible that the mutant used for genetic analysis may not 
behave as expected. The sqt insertion mutations are incompletely 
penetrant, sensitive to environmental conditions, genetic back- 
grounds and the age of the mother, and homozygous mutant embryos 
frequently survive to adulthood*’*"*. So the sqt alleles may not be 
complete nulls. Some maternal zygotic sqt*” (MZsqt*”) embryos 
manifest dorso—anterior deficiencies, similar to our findings after 
sqt morpholino injection. 

Bennett et al. contend that the sqt alleles must be null because the 
insertions should prevent translation of the Sqt protein. However, 
some RNAs have functions independent of the protein they encode: 
for instance, in Xenopus, removal of VegT transcripts disrupts the 
cytoskeleton at the vegetal cortex and prevents formation of germinal 
granules’’. We find that sqt RNA is present in MZsqtmutant embryos 
(Fig. 1d), and could perform a non-coding function. A priori con- 
siderations aside, to determine the loss-of-function phenotype 
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Figure 1| Unspliced sqt RNA is present in zebrafish ovaries, eggs and 
embryos. a, The squint locus and polymerase chain reaction (PCR) 
products. Red arrows, target sites for the sqt morpholinos. Black arrows, 
position of PCR primers. Primer pair AB amplifies part of intron I and exon 
II; primer pair CD spans intron II. Numbers in pink indicate nucleotide 
positions based on Vega v28 (http://vega.sanger.ac.uk/Danio_rerio/ 
exonview?transcript=OTTDART00000026522;db=core). b, Reverse 
transcription (RT)—PCRs for unspliced sqt intron I. Oligo(dT) or random 
hexamer-primed complementary DNAs synthesized from wild-type 
zebrafish RNAs was used in PCRs with primer pair AB. A 402-bp unspliced 
intron I product is detected in random-hexamer p(dN)¢-primed RT—PCRs 
at all stages. RT—-PCRs on oligo(dT)-primed cDNA using the same primers 
detect the 402-bp unspliced product strongly in ovary, 1,000-cell and 30% 
epiboly samples, but poorly in unfertilized eggs. Unspliced sqt RNA is not 
detected in oligo(dT)-primed cDNA from cleavage-stage embryos. No 
product is detected in RT controls for all stages. c, RT-PCRs for unspliced sqt 
intron II. Random hexamer-primed cDNA synthesized from wild-type 
zebrafish RNAs was used in PCRs with primer pair CD. A 696-bp unspliced 
intron II product is detected strongly in whole ovary and unfertilized eggs. 


+696 bp 
615 bp 


The unspliced intron II product diminishes by the 8-cell stage and is detected 
again at 30% epiboly at the peak of zygotic sqt expression. Unfertilized eggs 
were injected with control or sqt morpholinos (MOs) and fertilized in vitro 
using sperm from wild-type male zebrafish. Only spliced sqt product 

(615 bp) is detected in uninjected or control MO-injected embryos at the 
8-cell stage, when for sqt MO2-injected embryos, unspliced intron II 
containing sqt RNA is still detectable, with an aberrantly spliced sqt RNA 
species (469 bp). At 30% epiboly, both the 696-bp and the 469-bp products 
are enriched in sqt MO2-injected embryos. The aberrant sqt splice product 
should generate truncated Sqt protein lacking the carboxy-terminal 98 
amino acids, which include six of seven conserved cysteine residues in the 
Sqt mature domain. No product is detected in RT controls. d, RT—PCRs for 
sqt transcripts in MZsqt mutant embryos. Random hexamer-primed cDNA 
from 1-cell, 8-cell and 30% epiboly MZsqt mutant*'** embryos was used in 
PCRs with primer pair CD. Spliced sqt product (blue arrow) is detected in 
embryos from both MZsqt mutant alleles*'*'*. Unspliced sqt product (black 
arrow) is detected at the 30% epiboly embryonic stage. No product is 
detected in RT controls. S. Lim provided these data. Further details (for 
example, on the RT-PCR method) are available from the authors. 
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definitively, a deletion that completely removes the sqt locus is 
required. 

We therefore stand by our original conclusions’. Although the 
mutant analysis by Bennett et al.* disagrees with our results, we 
believe that further investigation is necessary to understand precisely 
how maternal Sqt functions. 
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Evolution of animal personalities 


Arising from: M. Wolf, G. S. van Doorn, O. Leimar & F. J. Weissing Nature 447, 581-584 (2007). 


Wolf et al.' propose a model to explain the existence of animal per- 
sonalities, consistent with behavioural differences among individuals 
in various contexts**—their explanation is counter-intuitive and 
cogent. However, all models have their limits, and the particular 
life-history requirements of this one may be unclear. Here we analyse 
their model and clarify its organismal scope. 

Under some conditions, Wolf et al.' find consistent behavioural 
differences between individuals that reproduce early in life and those 
that delay reproduction to explore their habitats instead to enhance 
future reproduction. Non-explorers that reproduce early in life 
later become bold and aggressive, whereas exploratory individuals 
with greater future reproductive potential are shy and unaggressive. 
These differences are caused by asset protection’ where individuals 
with greater future fitness take fewer risks that would jeopardize that 
fitness. 

Asset protection, however, is a negative feedback process that, 
given time, makes individuals more alike, not less. In Clark’s original 
asset protection paper’, many decisions are made over an animal’s 
lifespan. Over time, individuals tend towards similar behaviour, des- 
pite any initial differences in assets, because those with assets take few 
risks and acquire little new fitness. Those without high assets take 
more risks and (unless they die trying) acquire new fitness assets that 
become worth protecting. 

If, in the model of Wolf et al., individuals experience many hawk— 
dove encounters, successful hawks would eventually accumulate 
enough fitness for playing dove to become their optimal behaviour. 
Given time to accrue new assets, behavioural types would converge. 
Two particular conditions that could prevent this convergence are: 
animals with very short lives might not have time to change their 
assets sufficiently to cause changes in behaviour; and early life-history 
choices can have such large fitness consequences that subsequent 
bold and aggressive behaviour has relatively little influence on assets. 
Notably, these conditions do not seem to fit the maintenance of stable 
personalities in long-lived organisms such as humans. 

The model of Wolf et al. requires bold/aggressive contexts not to 
dominate one another in fitness consequences, otherwise the nega- 
tive feedback of asset protection will apply at this smaller scale 
(Supplementary Fig. 2 of ref. 1: in the square in which behavioural 
correlations could evolve, there is a wedge-shaped region without 


correlation between the hawk-dove and predator games). We 
reproduced their model and found that, in this region, thorough 
explorers are less aggressive than non-explorers, but no one is bold. 
Without the hawk—dove game, explorers would be shy and non- 
explorers bold, but when the hawk—dove game has sufficiently higher 
fitness consequences than the boldness game, all individuals are shy 
to eliminate the risk of dying before the fitness windfall from the 
hawk—dove game. This is the asset-protection principle, working 
on the scale of the low-fitness behavioural contexts, producing beha- 
vioural inconsistency, unless the contexts do not dominate one 
another. 

An alternative way of explaining behavioural consistency and 
correlations is through positive (not negative) feedback. For 
example, if thorough explorers gain assets (energy, size, knowledge) 
that improve their abilities to escape predators or to win fights, then 
we might find positive correlations between exploration, boldness 
and aggressiveness. Additional behaviour would positively feed back 
on state, maintaining differences in assets and behavioural types. 
What is needed next is a unified modelling framework in which both 
negative and positive state feedback, as well as other mechanisms, can 
be compared. 
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Wolf et al. reply 


Replying to: R. McElreath, B. Luttbeg, S. P. Fogarty, T. Brodin & A. Sih Nature 450, doi: 10.1038/nature06326 (2007). 


The evolution of animal personalities is still poorly understood. The 
emergence of consistent individual differences is relatively easy to 
envisage when initial differences in behaviour are reinforced by pos- 
itive feedback mechanisms. Such reinforcement might act through 
learning or training, or through behaviour-induced changes in an 
individual’s condition’ or environment’. However, positive feedback 
is not required. We showed that, even without such feedback, differ- 
ences in fitness expectations result in consistent differences in risk- 
taking behaviour’. This was illustrated by a model that, for simplicity, 
considers a short life history. McElreath et al.* argue that our results 
extend to long-lived organisms only under specific conditions. 
Although we agree that the full scope and limitations of our model 


still have to be mapped out, we believe that our arguments are also 
relevant to long-lived organisms. 

Our theory is based on the principle of asset protection®: the more 
an individual stands to lose, the more cautiously it should behave. 
McElreath et al.* argue that asset protection entails a negative feed- 
back that tends to erode individual differences. This may indeed be 
the case if large assets can be accumulated by risky behaviour: risk- 
proneness while accumulating assets would then be followed by risk- 
aversion while protecting the acquired assets. However, the analysis 
of McElreath et al. is incomplete for at least two reasons. 

First, not all payoffs should be considered as assets. Payoffs can be 
either spent immediately in current reproduction or invested into 
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future reproductive potential. Only the latter, resulting in an increase 
in future reproductive value’, corresponds to assets. Consequently, 
when the payoffs of risky games only affect immediate reproduction, 
no asset accumulation takes place and there is no negative feedback 
eroding individual differences. There might even be positive feed- 
backs, enhancing individual differences, if risky payoffs tend to be 
immediate whereas non-risky payoffs tend to increase the future 
reproductive value. 

Second, McElreath et al. extrapolate our model to long-lived 
organisms in a one-sided manner. They assume that differences in 
assets due to life-history decisions only occur once in an individual’s 
lifetime whereas the number and importance of risky games increases 
with life expectancy. There are certainly examples where an indivi- 
dual’s fate is governed by a single life-history switch. Yet, such ‘career 
decisions” are typically associated with long-lasting fitness conse- 
quences that are not eroded by everyday risky behaviour. More 
commonly, however, life-history decisions (such as thorough or 
superficial exploration) have to be taken repeatedly throughout an 
individual’s life. As a consequence, assets are not only eroded but can 
also be built up. 

In conclusion, the potential of negative feedback to erode indi- 
vidual differences is substantially smaller than McElreath et al. sug- 
gest. We therefore maintain that asset protection furthers the 
understanding of animal personalities in both short- and long-lived 
organisms. Yet, there are certainly situations in which negative feed- 
backs as described by McElreath et al. are important. In such situa- 
tions, a switch might occur from a risk-prone to a risk-averse 
personality. Indeed, personalities are not always stable from the cra- 
dle to the grave. Take our own species, where young individuals with 
a risky lifestyle become more cautious later in life (when assets are at 
stake). Similarly, hover wasps switch from risk-prone to risk-averse 
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behaviour once they are close enough to the breeding position®. Our 
theory accounts for such switches associated with asset accumulation 
and it produces testable predictions for their occurrence. Hence, even 
in the presence of negative feedbacks, the principle of asset protection 
is crucial for understanding animal personalities. 
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The remarkable metal-catalysed olefin 


metathesis reaction 


Amir H. Hoveyda’ & Adil R. Zhugralin' 


Catalytic olefin metathesis—through which pairs of C=C bonds are reorganized—transforms simple molecules to those 
that are complex and precious. This class of reactions has noticeably enriched chemical synthesis, which is the art of 
preparing scarce molecules with highly desirable properties (for example, medicinal agents or polymeric materials). 
Research in the past two decades has yielded structurally well-defined catalysts for olefin metathesis that are used to 
synthesize an array of molecules with unprecedented efficiency. Nonetheless, the full potential of olefin metathesis will be 
realized only when additional catalysts are discovered that are truly practical and afford exceptional selectivity for a 


significantly broader range of reactions. 


0 appreciate the importance of catalytic olefin metathesis, 

we must consider the power of chemical synthesis. The abi- 

lity to prepare molecules is crucial to advances in medicine, 

biology and materials science’. Chemical synthesis chal- 
lenges and expands our understanding of the fundamental principles 
of reactivity and selectivity, and gives us the opportunity to examine 
special molecules that were previously non-existent or available in 
such small quantities that a study was not feasible—for example, 
anticancer epothilones** and anti-hepatitis C agent 43 (see below). 
A recent remark by the director of the National Institutes of Health is 
apropos: “One interesting result of the NIH Roadmap development 
process came when we surveyed scientists to find out what the stum- 
bling blocks for biological sciences were. The number one stumbling 
block turned out to be synthetic organic chemistry’. 

For synthetic chemistry to provide its full impact, certain advances 
must first be realized. One crucial area is catalyst discovery. Catalytic 
processes represent a degree of efficiency superior to those that 
require one or more equivalents of a reagent (ideally, one mole per 
cent catalyst or less should be used). Of particular significance is the 
identification of effective catalysts that are readily available, easy to 
handle, reliable, and promote transformations at high selectivity with 
a broad range of substrates with minimal waste generation. Catalytic 
olefin metathesis*’° is a ground-breaking advance that has signifi- 
cantly enhanced the power of chemical synthesis, and is likely to 
continue to do so. 


Olefin metathesis and its importance 


Olefin metathesis (‘metathesis’ from the Greek meaning ‘change of 
position, transposition’) reorganizes the carbon atoms of two C=C 
bonds (olefins or alkenes), generating two new ones; it promotes 
unique skeletal rearrangements, and is significant for several reasons. 
First, some olefins are easy to prepare and others require more effort 
to access. Terminal and some disubstituted alkenes are prepared 
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Ring-closing metathesis 


with relative ease; tri- or tetrasubstituted olefins, on the other hand, 
present a challenge owing to higher levels of steric hindrance and 
complications associated with controlling cis and trans (or Eand Z) 
selectivity. Olefin metathesis allows facile access from the easily 
prepared olefins to those that are cumbersome to access. Efficient 
and stereoselective synthesis of the more substituted olefins is an 
important and largely unsolved problem in synthesis. Second, olefin 
metathesis reactions either do not generate a by-product or only 
produce one, such as ethylene, which can be removed by evapora- 
tion'’. Third, chemists routinely use olefins to interconvert mole- 
cules. Olefins are useful largely because they present the better of 
two worlds: stability and reactivity. Olefins are stable—they are typ- 
ically stored indefinitely without decomposition. And yet, olefins 
contain a m-bond that is sufficiently reactive to be used in a wide 
range of transformations. 

The repertoire of olefin metathesis catalysts. Olefin metathesis 
may be classified into three categories: cross, ring-closing and ring- 
opening metathesis (Fig. 1)’*. Cross metathesis is the most pedagogic- 
ally relevant version’’. As shown in Fig. 1, with an appropriate 
catalyst, C;=C, and C;=C, can be transposed into C;=C;3 and 
C,=Cy,,. It is perhaps difficult to see, at first glance, why one set of 
olefins would be favoured; this is a key issue, as all olefin metathesis 
reactions are in principle reversible. The possibility that products 
might be re-converted to the starting materials dictates that chemists 
must design reactions that avoid back-tracking. 

Another type, thus far the most widely used, is ring-closing 
metathesis (Fig. 1)'*. Here, two terminal alkenes react with the cata- 
lyst to generate a cyclic olefin, releasing a smaller olefin (C,=C, in 
Fig. 1). Ring-closing metathesis reactions can proceed to completion 
partly because volatile by-products are removed, trumping a reverse 
process. 

Finally, there is ring-opening metathesis’* (Fig. 1), through which 
a cyclic olefin reacts with a linear (acyclic) olefin, generating an 
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Figure 1| Different types of olefin metathesis. Cross metathesis, ring-closing metathesis and ring-opening metathesis: each represents a different type of 


reaction and furnishes a different kind of product. 
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acyclic diene. The driving force is the release of strain’® in ring struc- 
tures; this also ensures minimal reaction back to the cyclic com- 
pound. Reactions that involve cross metathesis are mechanistically 
more complex, and controlling such transformations can be difficult 
(versus ring-closing). To promote cross metathesis, the catalyst must 
fuse together two different cross partners; otherwise, homodimeriza- 
tion predominates. Such complications do not occur in ring-closing 
metathesis where an intramolecular process is often preferred over an 
intermolecular one (unless strained and entropically disfavoured 
rings are sought’’). 

Catalysts that make it possible. Some of the olefin metathesis cat- 
alysts that are widely used and have served as the basis for a range of 
other systems are shown in Fig. 2. Centre-stage is a molybdenum 
(Mo; ref. 18) or a ruthenium (Ru; ref. 19) atom. The Mo=C or 
Ru=C double bonds (a Mo alkylidene or a Ru carbene) serve as 
points of contact between the catalyst and olefins. As we will see, 
the metal centres are crucial to the properties of these catalysts. 
Although complexes of other metals (such as tungsten'*”°”', rhe- 
nium” and osmium”’) promote olefin metathesis, these exhibit lower 
stability and/or reactivity, and have not been as extensively investi- 
gated. Development of such catalysts, however, is a compelling future 
objective, because—similar to Mo and Ru systems—additional metal 
complexes are likely to provide unique or complementary reactivity 
and/or selectivity profiles. 

Mo catalyst 1”, prepared and handled under inert atmosphere, is 
generally more active than Ru catalysts 2*** and 3” (Fig. 2), which 
are stable to air and moisture. The activity of Mo and Ru catalysts are, 
to a large degree, complementary"®. Ru catalysts may be used with 
substrates that carry an alcohol, a carboxylic acid, or an aldehyde, but 
can be rendered inactive in the presence of structurally exposed 
amines” and phosphines*’; the reverse holds for Mo catalysts'®. 
Metal complexes 1-3, as well as a number of Ru-based deriva- 
tives, are commercially available. 

Mo and Ru catalysts 4-6***” (Fig. 2) are chiral. Handedness, or 
chirality, is an attribute of many molecules of life; polypeptides, 
nucleic acids, carbohydrates, and numerous naturally occurring 
molecules that exhibit biological activity exist as a single enantiomer 
and consist of enantiomerically pure smaller units. A critical 
objective of modern chemistry is the development of catalysts** that 
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promote the formation of chiral molecules of high enantiomeric 
purity (ideally >98% purity) from readily available and inexpensive 
achiral ones*’. Catalysts 4-6 initiate reactions that favour formation 
of one enantiomer. Thus far, Mo-based chiral catalysts, one of which 
is commercially available’, promote ring-closing metathesis with 
higher enantioselectivities for a wider range of substrates'****°; both 
classes are effective in enantioselective ring-opening/cross meta- 
thesis, affording carbo-*”*' and heterocyclic products***’ (see below). 
The development of chiral olefin metathesis catalysts is less advanced 
than that of the achiral variants, and many important discoveries 
remain to be made. For instance, there are no effective chiral catalysts 
for enantioselective cross metathesis*’. 


Catalytic ring-closing metathesis 


A general pathway“ illustrating how such transformations proceed is 
shown in Fig. 3A. All olefin metathesis reactions involve association of 
the metal with an olefin substrate”. It is in this crucial interaction 
that one significant difference between Mo- and Ru-based catalysts 
presents itself. A high-oxidation-state Mo centre (+6) is a Lewis acid 
that chelates with a Lewis basic olefin; in contrast, in Ru catalysts, it is 
the alkene substrate that serves primarily as a 1 Lewis acid**””. 

Overall, the catalytic cycle (Fig. 3A) consists of an initiation phase 
(generation of the active complex) and a propagation phase (the 
active complex promotes additional cycles). Catalysis commences 
by a cross metathesis between an active carbene or alkylidene 
(M=C) and one of the two olefins of the substrate (i) to generate a 
metallacyclobutane (ii)***’. The metallacyclobutane might revert to i 
and M=C (pathway a) or the other two bonds of the ring might be 
ruptured, furnishing iii, where the metal (M) is within the substrate 
(pathway b). Formation of another metallacyclobutane (iv) and its 
disintegration furnishes cyclic product v and M=C; (vi), which is the 
metal-bearing agent serving as the catalyst. What typically drives 
reactions is that the cyclic product (v) does not easily react with 
the active catalyst (M=C3) to cause ring-opening metathesis. 

The identity of the intermediates in the catalytic cycle is well 
understood”; it is, however, often unclear whether it is catalyst— 
substrate association (i and vii first chelate with the metal centre of 
M=C before conversion to ii or viii), formation of the metalla- 
cyclobutane” or its cleavage that is the irreversible, product- or 
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Figure 2 | Representative olefin metathesis catalysts. Catalysts 1-3 (top) are among the most commonly used achiral catalysts. Catalysts 46 (bottom) are 
chiral, and can react at different rates with two enantiomers of a substrate (kinetic resolution) or can convert an achiral molecule to a chiral one with 
preference for one of the two enantiomers (asymmetric synthesis). Mo-based catalysts 1 and 4 are air sensitive but generally more active than air-stable 


Ru-based catalysts 2, 3 and 5, 6, respectively. 
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rate-determining step (Fig. 3A). It is a daunting task to predict a 
‘most effective catalyst’ or to design—in the true definition of the 
word—one: what differentiates a selective process from one that is 
non-selective is a mere 2—2.5 kcal mol ' difference in activation bar- 
riers (for example, rotation around the C—C bond of ethane requires 
only ~3kcal mol™'), and seemingly insignificant alterations in the 
substrate structure or conditions can change the energetics of the 
catalytic cycle (which step is rate-determining)™. These considera- 
tions indicate that seeking a truly ‘general catalyst’ is likely to be futile 
—different classes of substrates may require a different ‘optimal’ 
catalyst. Chemists address such challenges through invention of a 
class of catalysts (as opposed to one compound) that are easily modi- 
fied to achieve maximum reactivity and/or selectivity. The more 
readily modifiable a catalyst class, the larger the number of available 
catalysts, and the better the odds of obtaining more desirable 
results’®. 

There is little doubt that ring-closing metathesis has elevated the 
art and science of chemical synthesis****. Two ring-closing metathesis 
reactions that have been carried out for the synthesis of the large rings 
of two natural products are shown in Fig. 3B. The linear chain of 7, 
with Lewis-basic and potentially catalyst-deactivating oxygen- 
and nitrogen-containing groups, is converted to the 14-membered 
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ring lactam 8 by catalyst 1°”. In two additional steps, the product of 
ring-closing metathesis (8) is transformed to antifungal and anti- 
influenza agent fluvirucin B,**. The volatile ethylene (H,C=CH2) 
is the by-product. 

It is easy to appreciate the power of catalysis when one considers 
the alternative pathways—exits on a reaction highway either avoided 
or traversed reversibly. The challenge in any ring-closing is that 
a cross metathesis, such as generation of 9 (Fig. 1), represents a 
competitive route; the catalyst must be sufficiently active to reverse 
this ‘wrong turn’. The two C=C bonds in 7 are not identical: one is 
the more accessible and reactive monosubstituted alkene. Formation 
of 9 is therefore of particular concern, since it arises from coupling of 
two molecules of 7 by a reaction of the less hindered olefins. How is it, 
then, that 8 can be prepared in 92% yield? The key lies in the rever- 
sibility of olefin metathesis. Substantial amounts of 9 are indeed 
formed, but catalyst 1 reverts 9 to 7. Macrocyclic 8, on the other 
hand, does not undergo further reaction (ring-opening); the central 
olefin of 9 is less hindered than the olefin in 8 (trisubstituted). The 
success of the seemingly straightforward closure of 7 to afford 8 
involves several nuances. Efficient synthesis of the large ring depends 
on striking a balance among cross metathesis, a process that delivers 
the undesired coupling of two substrate molecules, ring-closing 
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Figure 3 | The general mechanism for ring-closing olefin metathesis. This mechanism involves the intermediacy of metallacyclobutanes (A); also shown are 
two examples (B and C) that illustrate the power of catalytic ring-closing metathesis in the total synthesis of biologically active natural products (M = Mo or 
Ru). In both cases, the reversible nature of cross metathesis is critical to efficient synthesis of the desired macrocycles. 
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metathesis, a reaction that affords the desired product, and a ring- 
opening metathesis pathway that would dismantle the desired cyclic 
compound. 

Another revealing example is the ‘stitching’ of two molecules of 10 
to access 22-membered ring 12 en route to the cytotoxic agent cylin- 
drocyclophane F (Fig. 3C)*. The homodimerization of 10 must 
transpire in the desired fashion, as the two olefins within each 
molecule of 10 are different. The coupling can occur in two ways: 
head-to-head or head-to-tail, and the head-to-tail is needed for the 
subsequent ring-closing metathesis that affords 12. The successful 
synthesis of 12 is not due to preferable formation of 11 over the 
alternative head-to-head product—it is because ring-closing meta- 
thesis of 11 is more favoured. Detailed investigations reveal that the 
head-to-head product is generated but is converted back to 10, which 
undergoes coupling to generate 11. The reversible nature of olefin 
metathesis proves crucial again. 

In the case of 8, the olefin, generated through olefin metathesis, is 
erased by palladium-catalysed hydrogenation to establish the distal 
methyl-bearing stereogenic centre. To untrained eyes, fluvirucin B; 
or cylindrocyclophane F are structures that do not contain an olefin 
within their macrocycle, and the idea of using olefin metathesis to 
access these molecules may not be evoked. When a synthetic chemist 
contemplates a target from the perspective of olefin metathesis, she 
must choose, among various possibilities, where to place an olefin in 
the precursor substrate, all signs of which may subsequently be 
erased. In designing a route, she commits to a strategy that predicates 
the identity of the bonds that realize their union. The chemist makes a 
move, one that affects the others that will follow?”. 

The total synthesis presented in Fig. 4 is based on the insight that 
the architecture of longithorone A can be born through coupling 
of two (nearly identical) smaller building blocks 16 and 19°. 
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Cyclizations of 13 and 17, providing the 15-membered rings of 15 
and 18 (Fig. 4) are catalysed by Ru complex 14”, an earlier version of 
2 (Fig. 1). These ring-closing reactions involve an alkyne in processes 
referred to as enyne metathesis® °°. Products 15 and 18 are trans- 
formed to 16 and 19 such that these unsaturated macrocycles can be 
directed to participate in two separate Diels-Alder additions, cour- 
tesy of enyne metathesis products. The protagonists in the first act are 
identified in red: the 1,3-diene of 19 (red) reacts with the olefin of 16 
(red) to join the two pieces while generating a cyclohexene (red). This 
is followed by another cycloaddition between the 1,3-diene (blue) 
and an olefin (blue) that resides within 16. 


Catalytic ring-opening and cross metathesis reactions 


A general scheme illustrating the mechanism of ring-opening meta- 
thesis is depicted in Fig. 5. A cyclic olefin (ix) joins with a catalyst 
(M=C) to generate a metallacyclobutane (x), which may collapse to 
furnish a metal-containing intermediate xi. Acyclic xi can react with 
another acyclic olefin (C3;=Cy, versus cyclic olefin x) to yield another 
metallacyclobutane (xii). Rupture of xii furnishes the final product 
(xiii) and affords the active catalyst (xiv), which propagates addi- 
tional cycles. The second olefin metathesis sequence, involving con- 
version of xi to xiii, constitutes a cross metathesis, and hence the 
concatenation of events that converts ix to xiii is referred to as ring- 
opening/cross metathesis. If the cyclic olefin proves more reactive 
than the cross partner, or if a cross partner (for example, C;=C,) is 
absent, xi can react with another molecule of ix. The resulting M=C 
transforms another ix and soon the active species romps through the 
substrate molecules, generating polymeric products. Such a process 
is referred to, appropriately, as ROMP (ring-opening metathesis 
polymerization). Alternatively, if the cross partner is more reactive 
(versus cyclic olefin), the major product arises from homo-coupling 


50 mol% Ru catalyst 14, 
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Figure 4 | A concise synthesis of longithorone A. This synthesis involves coupling of segments 16 and 19, both of which were prepared by ring-closing 
ene-yne metathesis reactions of 13 and 17, respectively. TBS (t-butyldimethylsilyl) is a common protecting group that masks alcohols. 
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Figure 5 | General mechanism for catalytic ring-opening/cross metathesis. Ring-opening metathesis is initiated by the rupture of a cyclic C-C 
double bond (olefin) by a catalyst (M=C). This affords a new alkylidene (xi), which undergoes cross metathesis with another olefin to furnish the final 


product (xiii). 


(cross metathesis). To achieve high yields of ring-opening/cross 
metathesis, the catalyst must react with the two substrates in the 
proper sequence with high precision. Such considerations pose 
demanding challenges in catalyst development. One recent approach 
involves catalysts that bear a stereogenic metal centre (for example, 6 
in Fig. 2 and 49 below), leading to the intermediacy of two diaster- 
eomeric metal-carbene intermediates (M=C), each reacting prefer- 
ably with the cyclic olefin or the acyclic cross partner*”®*. 

Small cyclic alkenes are strained'’—they contain a substantial amount 
of energy that can be released on rupture (up to ~55kcalmol7'; 
~2kcal mol”! difference between two pathways means ~97% select- 
ivity). Strain energies have been exploited to promote ring-opening/ 
cross metathesis reactions'*”. Two examples are presented in Fig. 6. 
In a total synthesis of cytotoxic agent bistramide A%’, cyclopropene 20 
participates in a catalytic ring-opening/cross metathesis with acyclic 


Me Me 


olefin 21 to yield 22. In a subsequent cross transformation, the more 
accessible alkene of 22 (blue) is set up to react with another acyclic olefin 
(23) to furnish 24 (and gaseous ethylene as by-product). Two olefin 
metathesis reactions are therefore used to stitch three molecules—20, 
21 and 23—together, and, in short order, a complex molecule that 
constitutes a significant portion of bistramide A is fabricated. 

In an enantioselective total synthesis of baconipyrone C®, a sipho- 
nariid (isolated from false limpets, Siphonaria baconi) metabolite 
(Fig. 6), the plane of symmetry within achiral 25 is removed through 
a ring-opening/cross metathesis with styrene promoted by chiral 
catalyst 6 (Fig. 2), yielding heterocycle 26 with ~95% selectivity 
(89% enantiomeric excess). After conversion of 26 to 27, another 
sequence involves conversion of 27 to 30 via 28 and 29. A ring-closing 
olefin metathesis, this time promoted by achiral catalyst 3 (Fig. 2), 
furnishes allylic alcohol 30 en route to the target. 
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Figure 6 | Catalytic ring-opening/cross metathesis provides uniquely efficient pathways for synthesis of biologically active natural products. In the 
total synthesis of bistramide A, a Ru-catalysed ring-opening cross metathesis is followed by another cross metathesis to furnish 24. In the total synthesis 
of baconipyrone C, a chiral Ru catalyst converts achiral 25 to pyran 26 with strong preference (95%) for one enantiomer. Bn (benzyl) and PMB 
(p-methoxybenzyl) are protecting groups that prevent alcohols from undergoing reaction. Dashed boxes represent segments of the target molecules prepared 


through catalytic olefin metathesis. 
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Figure 7 | Catalytic C-H activation is coupled with catalytic olefin metathesis for a net ‘alkane metathesis’ process. 


Thus far, we have considered how, after ring-closing, ring-opening/ 
cross or cross metathesis, olefins within the product might be masked 
by a subsequent reaction. The sequence in Fig. 7 illustrates a trans- 
formation that takes this ‘vanishing act’ to another level: although 
catalytic olefin metathesis is involved, neither the products nor the 
starting materials contain an olefin”. The alkenes needed are obtained 
in the course of the transformation by iridium-based catalyst 32, 
which converts two equivalents of the saturated hydrocarbon (alkane) 
31 to two of alkene 34 by C-H activation” this process transforms 32 
to iridium-dihydride 33. Mo catalyst 1 (Fig. 1), present in the mixture, 
waiting for olefins to become available, connects with two molecules 
of 34 and transforms them to 35 and ethylene (cross metathesis). 
Alkene 35 can revert back to 34 after cross metathesis with ethylene, 
or, alternatively, its alkene may undergo isomerization to generate 36, 
in a reaction that is also catalysed by iridium complex 32. Alkene 36 
participates in a Mo-catalysed cross metathesis with ethylene to pro- 
duce two new terminal alkenes 37 and 38, which can be converted to 
alkanes 39 and 40 by iridium-dihydride 33, delivering back the hydro- 
gens it borrowed from 31 to the olefins of 37 and 38. The last reaction 
leads to re-formation of 32 from 33, which moves on to afford alkene 
34 and initiate additional cycles. One feature of this molecular cho- 
reography is that the sizeable iridium dihydride 33 deposits its two 
hydrogens on the terminal alkenes of 37 and 38 but not on the more 
hindered internal olefins of 35 or 36. On the other hand, if 33 adds 


42 Me 


Toluene, 80 °C 
>400 kg scale 


83% yield, >99% cis alkene 


its hydrogen atoms to the olefin in 34, substrate 30 is generated again 
and the catalytic cycle begins anew. 

With a two-catalyst approach for alkane cross metathesis, will it 
be feasible to design a catalytic alkane ring-closing metathesis? In 
cases where a cycloalkane is desired, such a direct process could 
be more efficient than a two-step, alkene ring-closing metathesis/ 
hydrogenation sequence. An inherent advantage of catalytic olefin 
metathesis is, however, the range of useful possibilities that the rich 
chemistry of olefins offers. Nevertheless, such considerations have led 
to the development of tandem catalytic reactions carried out in a 
single vessel, promoted by one catalyst and involving olefin meta- 
thesis reactions; Ru-catalysed olefin metathesis/hydrogenation and 
olefin metathesis/olefin isomerization are representative examples”’. 


Catalytic olefin metathesis and a better quality of life 
Through a select few examples (there are many more’”’””°), we have 
seen how catalytic olefin metathesis has allowed chemists to synthes- 
ize medicinally relevant agents in entirely new ways. The utility of 
this class of reactions extends beyond small-scale laboratory prepara- 
tion. A catalytic ring-closing metathesis promoted by catalyst 42”, a 
precursor to 3 (Fig. 2), has been used by scientists at Boehringer- 
Ingelheim to prepare multi-kilogram quantities of 43 (Fig. 8), a 
complex molecule that is precursor to a potent anti-hepatitis C 
agent’**°, 


43 BILN 2061 ZW 


(active against hepatitis C) 


Figure 8 | Catalytic olefin metathesis has been used in the large-scale preparation of pharmaceutical candidates. An example is the potent hepatitis C 
protease inhibitor BILN 2061 ZW developed by Boehringer-Ingelheim. Macrocyclic intermediate 43 has been prepared in multi-kilogram quantities as a 


single C=C (olefin) isomer with high efficiency. 
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The influence of olefin metathesis has spread beyond biological 
chemistry*'. Large-scale preparation of common organic feedstock 
chemicals and polymers is one area that has been affected. Polymers 
can be produced under mild conditions with commercial catalysts 
through ROMP processes. Commercially produced ROMP polymers 
include polyoctenamer (Vestenamer), polynorbornene (Norsorex) 
and polydicyclopentadiene*” (polyDCPD, Metathene, Metton, 
Pentam, Prometa, Telene). Fully hydrogenated analogues of some 
of these polymers are available as well; Zeonex is a saturated 
ROMP polymer of substituted polynorbornene. 

Perhaps the most widely commercialized ROMP polymer is poly- 
dicyclopentadiene, which is prepared from endo-dicyclopentadiene— 
a by-product in naphtha crackers*’. Various ‘ill-defined’ olefin meta- 
thesis catalysts are used to manufacture polydicyclopentadiene. For 
example, in a process that affords Telene, tetrakis(tridodecylammo- 
nium)octa-molybdate serves as a precatalyst along with the activating 
mixture that consists of EtAlCl,, SiCl, and propanol. A crucial advant- 
age of well-defined Ru-based catalysts is their high degree of polymer- 
ization, an attribute that allows for removal of the odour that emanates 
from the un-reacted dicyclopentadiene monomer in polydicyclopen- 
tadiene*’. Polydicyclopentadiene is easily castable and mouldable by 
reaction injection moulding technology; moreover, owing to its dur- 
ability and resistance to corrosion, polydicyclopentadiene has found a 
wide range of applications in heavy machinery manufacturing (for 
example, agricultural equipment). 

Although numerous polymers are used in the manufacturing of 
high-end sporting and recreational goods, day-to-day items, medical 
instruments and electronics, and have applications in optics, access 
to other ROMP polymers requires development of more efficient 
and ‘smarter’ catalysts. Within this context, Chen has devised an 
approach to the synthesis of alternating copolymers that is based 
on differential reactivity of diastereomeric carbene intermediates in 
ROMP processes”. These transformations are promoted by complex 
49, which bears a stereogenic Ru centre (hence the involvement of 
stereisomeric carbenes). Cyclooctene is thus co-polymerized with 
norbornene to afford an alternating copolymer, whereas copolymer- 
ization of these monomers under standard protocols delivers only a 
random copolymer. Chen’s method is proof-of-principle; future 
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research is required to address the need for active catalysts that yield 
a variety of alternating ROMP copolymers. 

Block copolymers are poised to have a significant impact on nano- 
technology. One approach to effective synthesis of these macromo- 
lecules involves ROMP processes promoted by rapidly initiating 
olefin metathesis catalysts. A new application for block copolymers, 
disclosed by General Electric, involves an inorganic-organic block 
copolymer of polynorbornene-decaborane as a single-source ceramic 
precursor, which was prepared by co-polymerization of norbornene 
with 6-(norbornenyl)decaborane****. This copolymer was subse- 
quently converted to nanostructured boron carbonitride and meso- 
porous boron nitride. In another development, Nuckolls has shown 
that a functionalized Ru-metal surface may catalyse olefin meta- 
thesis**. Thus, ROMP on such a surface could furnish conducting 
polyene nanowires. 

The robustness of the ROMP polymers, and of the catalytic pro- 
cesses used to prepare them, has allowed the ROMP reaction to be 
incorporated into manufacturing methods. For example, olefin 
metathesis has been used in the design of self-healing materials*’. 
Cracks, resulting from wear and tear, that appear in structural poly- 
mers can be made ‘self-healing’ by incorporation of small vesicles of 
dicyclopentadiene monomer, a highly effective ROMP substrate, 
within the material in question, together with small amounts of an 
olefin metathesis catalyst. If microcracking occurs, vesicles of mono- 
mer interact with the catalyst to elicit fast polymerization, resulting in 
formation of a robust polydicyclopentadiene plug in the place of the 
crack. 


Challenges that lie ahead 


Many critical discoveries in catalytic olefin metathesis remain to be 
made. One area relates to the development of more robust yet active 
catalysts that are easily prepared and provide exceptional control of 
stereochemistry. Discovery of catalysts that allow for control of olefin 
stereoselectivity to obtain the thermodynamically less favoured cis 
disubstituted olefins, or those that promote formation of E or Z 
trisubstituted alkenes, is a significant challenge, and would constitute 
a major contribution to olefin metathesis. Identification of chiral 
catalysts that furnish high enantioselectivity for a range of substrates 
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Figure 9 | More recent and modified variants of Ru-based olefin metathesis catalysts 44-49 and an easier way to access a range of chiral Mo catalysts. 
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and reaction classes, such as enantioselective cross metathesis or 
enyne metathesis, are exciting problems that are yet to be addressed. 

Although many of the transformations presented above furnish 
desirable products, the amount of catalyst required for an efficient, 
cost-effective process is too high. In certain cases, as much as 
50 mol% (half the amount of the substrate) of a catalyst is needed 
(for example, Fig. 4): but even a 3% loading is considered excessive 
for a ‘real-world’ industrial process (for example, Fig. 8). High load- 
ings are often the result of less than optimal catalyst lifetimes. Catalyst 
efficiency is, therefore, about more than simply faster rates, or high 
turnover frequency—it is crucial that high turnover numbers are 
achieved. Discovery of active and selective catalysts that are more 
robust will require a detailed understanding of often unexpected 
and abstruse deactivation pathways®. 

Removal of trace metal impurities from olefin metathesis products 
is another complication of note, particularly in cases where com- 
pounds will be used in clinical trials. This brings us to the need for 
catalysts that can be easily recovered and re-used; effective catalysts 
bound to polymeric surfaces can minimize impurity levels and are 
economically attractive’. In many olefin metathesis reactions that 
involve formation of large rings, high-dilution conditions are 
required, rendering the use of such processes difficult and prohibi- 
tively expensive in larger industrial-scale conditions. Catalysts 
designed to discourage competitive homodimerization of two sub- 
strate molecules (versus cyclization of one substrate molecule) are 
therefore needed. 

With better catalysts, the currently untapped power of olefin 
metathesis will give rise to increasingly exciting applications. New- 
generation catalysts with improved properties are beginning to 
emerge. Notable examples are modified Ru catalysts 44°' and 45° 
(Fig. 9), which can in some cases furnish faster reactions than the 
parent complex 3 (but, at times, this comes at the cost of lower 
catalyst stability), or those that can be used to catalyse reactions in 
water (46)”°. Ru catalysts have been attached to sol-gel glass surfaces 
to give ‘tablets’ (47); when the tablets are placed in solution, reaction 
with substrates causes catalyst release, leading to efficient reactions”. 
The tablets can be removed simply with a pair of tweezers (no filtra- 
tion and washing) and re-used up to 20 times. Catalyst 48 has been 
used to synthesize cyclic polymers by procedures that do not require 
linear precursors”. The unique features of complex 49, which, 
similar to 6, contains a Ru stereogenic centre, have already been 
discussed®*. An important area of research involves development 
of highly reactive, but sensitive, catalysts by user-friendly in situ 
preparation methods that begin with relatively easy-to-handle pre- 
cursors (50 and 51 to give chiral Mo catalyst 4)°°. 

Nature presents us with a range of architectures that are diverse in 
size, complexity and function. Chemical synthesis—accessing mole- 
cules revealed to us by our imagination—is crucial to our ability to 
produce compounds that are not found in nature, but are perhaps 
equally as enriching. Catalytic olefin metathesis is a component of 
making such dreams come true. There is, however, far more that we 
cannot do. The little that we can do is in need of substantial improve- 
ment. To consider catalytic olefin metathesis, or chemical synthesis, a 
consummated field would be akin to suggesting to Henry Ford that 
his model T was the be-all and end-all as far as automobiles are 
concerned. 
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Hedgehog regulates smoothened activity 
by inducing a conformational switch 


Yun Zhao'*, Chao Tong’*+ & Jin Jiang’” 


Hedgehog (HH) morphogen is essential for metazoan development. The seven-transmembrane protein smoothened (SMO) 
transduces the HH signal across the plasma membrane, but how SMO is activated remains poorly understood. In Drosophila 
melanogaster, HH induces phosphorylation at multiple Ser/Thr residues in the SMO carboxy-terminal cytoplasmic tail, 
leading to its cell surface accumulation and activation. Here we provide evidence that phosphorylation activates SMO by 
inducing a conformational switch. This occurs by antagonizing multiple Arg clusters in the SMO cytoplasmic tail. The Arg 
clusters inhibit SMO by blocking its cell surface expression and keeping it in an inactive conformation that is maintained by 
intramolecular electrostatic interactions. HH-induced phosphorylation disrupts the interaction, and induces a conformational 
switch and dimerization of SMO cytoplasmic tails, which is essential for pathway activation. Increasing the number of 
mutations in the Arg clusters progressively activates SMO. Hence, by employing multiple Arg clusters as inhibitory elements 
counteracted by differential phosphorylation, SMO acts as a rheostat to translate graded HH signals into distinct responses. 


The HH morphogen controls many key development processes, with 
different thresholds specifying distinct outcomes’. In Drosophila 
wing discs, HH proteins secreted by posterior (P) compartment cells 
move into the anterior (A) compartment to form a local concentra- 
tion gradient*®. Low levels of HH suffice to induce the expression of 
decapentaplegic (dpp), whereas high levels are required to induce 
patched (ptc) and engrailed (en) (Supplementary Fig. 1)”. 

The reception system for HH consists of a twelve-transmembrane 
protein, PTC, as the HH receptor and a seven-transmembrane pro- 
tein smoothened (SMO) as the signal transducer’*"'*. In Drosophila, 
HH binding to PTC abrogates its inhibition on SMO and induces 
extensive phosphorylation of the SMO cytoplasmic tail by protein 
kinase A (PKA) and casein kinase I (CK]), leading to SMO cell surface 
accumulation and activation'*”. How phosphorylation promotes 
SMO cell surface accumulation is not understood. In addition, phos- 
phorylation may regulate SMO activity through mechanism(s) other 
than controlling its cell surface abundance. 


Regulation of SMO by multiple Arg clusters 

Our previous study indicates that phosphorylation may regulate 
SMO cell surface abundance by either preventing its endocytosis 
and/or promoting its recycling’. To investigate further how SMO 
cell surface expression is regulated, we generated a set of C-terminally 
truncated SMO variants and examined their subcellular localization 
using a cell-based assay (Fig. 1). Deletion up to amino acid 818 did 
not significantly change SMO subcellular distribution; however, fur- 
ther deletions resulted in progressively increased cell surface expres- 
sion (Fig. la, c), implying that multiple negative regulatory elements 
exist between amino acids 661-818. 

SMOAC710 exhibits consistently higher cell surface expression 
than SMOAC730 (Fig. 1c), indicating that amino acids 710-730 
may harbour a negative element(s). Ala-scan mutagenesis, which 
substituted multiple residues to Ala, identified the Arg residues in 
RRTQRRR as critical for preventing SMO cell surface accumulation 
(Fig. 1b, c; data not shown). Interestingly, multiple Arg clusters, 


arbitrarily named R1 to R4, are located between amino acids 661— 
818, a region critical for blocking SMO cell surface accumulation 
(Fig. 1d). We therefore introduced into the full-length SMO Arg to 
Ala (RA) mutations in individual, or combinations of, Arg clusters. 
SMO variants with one Arg cluster mutated did not exhibit signifi- 
cant change in their cell surface expression; however, mutating two or 
more Arg clusters caused a gradual increase in SMO cell surface 
expression (Fig. 1d—f; data not shown), suggesting that multiple 
Arg clusters cooperate to restrict SMO cell surface accumulation. 

To determine whether the Arg clusters negatively regulate SMO 
activity, SMO variants with one or more mutated Arg clusters were 
expressed in wing discs using the MS1096 Gal4 driver. SMO variants 
with one mutated Arg cluster exhibited low levels of basal activity 
similar to that of wild-type SMO, as is evident from the ectopic 
expression of dpp but not ptc and en (Fig. 2a—c). However, SMO 
variants with two or more mutated Arg clusters exhibited a progres- 
sive increase in their constitutive signalling activities (Fig. 2d-i). 
Thus, SMO activity is inversely correlated with the number of func- 
tional Arg clusters. We also mutated several Arg clusters in the mem- 
brane-proximal region of the SMO cytoplasmic tail and observed no 
effect on SMO cell surface expression and activity (Supplementary 
Fig. 2). Hence, the Arg clusters between amino acids 661-818 are 
specifically involved in SMO autoinhibition. 


Phosphorylation counteracts the Arg motifs 
Increasing the number of phosphorylation-mimetic mutations in 
PKA/CKI sites resulted in a graded increase in SMO cell surface level 
and activity’’, which phenocopies the effect of increasing the number 
of RA mutations, indicating that phosphorylation may activate SMO 
by antagonizing the Arg motifs. Consistently, an internal deletion 
that removes both the phosphorylation and Arg clusters (SMOA661— 
818) results in high levels of SMO cell surface expression and activity 
(Figs la, c and 2)). 

It is intriguing that the Arg clusters are situated adjacent to the 
PKA/CKI phosphorylation clusters (Fig. 1d). In fact, Rl, R2 and R4 
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are part of the PKA phosphorylation consensus site, R/KRXS. The 
juxtaposition of the Arg and phosphorylation clusters may allow 
precise control of SMO activity because phosphorylation at 
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Figure 1| Regulation of SMO cell surface expression by multiple Arg 
clusters. a, SMO deletion mutants with CFP (not shown) fused to their 

C termini. Filled boxes indicate the transmembrane domains. b, Ala-scan 
mutagenesis of SMOAC730 with the last 20 amino acids and corresponding 
substitutions shown underneath. ¢, e, Cell surface expression of the indicated 
SMO mutants. S2 cells were transfected with the indicated CFP-tagged SMO 
constructs, followed by immunostaining with anti-SMON antibody before 
membrane permeabilization’. The SMON column indicates cell surface 
staining, whereas the CFP column indicates the total protein distribution. 
SMOA730RA and SMOA730TA behaved like SMOA710 and SMOA730, 
respectively (data not shown). d, A schematic drawing of a full-length SMO 
with the sequences of the four Arg clusters (R1I—R4) and three 
phosphorylation clusters (S1-S3) shown underneath. SMO variants with the 
indicated substitutions are listed. f, Ratio of cell surface level (SMON signal) 
to total level (CFP signal) of protein for wild-type and indicated mutant forms 
of SMO. For SMO variants, n = 20; error bars, 1 s.d. 
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individual clusters may only neutralize the negative influence of 
adjacent Arg clusters. To test this, we constructed SMORA12D3 
and found it behaved like SMORA124 (Fig. 1d, e; compare Fig. 21 
with 2h), suggesting that phosphorylation at S3 (Fig. 1d) neutralizes 
the negative effect of R4. 

Because Arg carries positive charge whereas phosphorylation 
brings in negative charge, phosphorylation may antagonize the Arg 
clusters by neutralizing their positive charges. In support of this 
model, we found that R3 and R4 can be functionally substituted by 
Lys, because SMORA12K34 behaved like SMORA12 rather than 
SMORA1234 (Fig. 1d, e, 2k). Furthermore, SMOEDS, which has 
three PKA/CKI phosphorylation clusters replaced by a stretch of 
acidic amino acids (Fig. 1d), exhibited high levels of cell surface 
expression and signalling activity similar to the phosphorylation- 
mimetic SMO variant, SMOSD123 (Fig. 1d, e, 2m; ref. 15), suggest- 
ing that the exact sequence composition of the phosphorylation 
clusters is not critical, but rather the negative charges they carry are 
important. 


HH induces increased proximity of SMO cytoplasmic tails 

Although SMO activity correlates with its cell surface levels, HH may 
induce SMO activation through additional mechanism(s) such as dimer- 
ization and/or conformational change'*””. To test these possibilities, 


dppZ 


Figure 2 | In vivo activities of SMO variants. a—n, Wing discs expressing the 
indicated SMO variants from MS1096 were immunostained to show the 
expression of dpp-lacZ (dppZ), ptc-lacZ (ptcZ) and EN. dppZ, ptcZ and en are 
induced by low, intermediate and high levels of HH, respectively. The levels of 
SMO activity inversely correlate with the number of intact Arg clusters 
(b-i). An internal deletion removing amino acids 661-818 resulted in high 
levels of constitutive SMO activity (j). SMORA12K34 (k) has similar activity 
to SMORA12 (compare to d). SMORA12D3 (I) exhibited constitutive activity 
similar to that of SMORA124 (h). m, Substitution of the three PKA/CKI 
phosphorylation clusters with acidic clusters led to high constitutive activity 
similar to that of SMOSD123 (ref. 15). n, SMOAN1234 (see Fig. 4a) did not 
exhibit higher basal activity than SMOWT (compare to a). 
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we employed fluorescence resonance energy transfer (FRET) analysis, 
which measures the transfer of energy between yellow fluorescent 
protein (YFP) and cyan fluorescent protein (CFP) as a function of 
distance*®. We initially constructed two pairs of tagged SMO with 
CEP/YEP either fused to the C terminus (SMO-CEP°/SMO-YEP°) 
or inserted at an amino-terminal position (SMO-CEPN/SMO-YEP) 
of SMO (Fig. 3a, b; Supplementary Fig. 3). As controls, we constructed 
CFP/YFP-tagged forms of frizzled 2 (FZ2) and RAB5. 

Consistent with a previous finding that FZ family members form 
constitutive dimers/oligomers”', we observed high FRET between 
FZ2-CFP°/FZ2-YFP© (17.34 1.9%) or FZ2-CFPN/FZ2-YFPN 
(12.6 + 1.1%) in S2 cells (Fig. 3c, d). Under similar conditions, 
FRET between SMO-CFPN/SMO-YEPN (referred to as FRETN) 
was 14.1+ 1.4% (Fig. 3c), whereas FRET between SMO-CEP®/ 
SMO-YFP® (FRET®) was 5.7 + 1.3% (Fig. 3d). HH stimulation sig- 
nificantly increased FRET to 21.7 + 1.5% (Fig. 3d), but only mod- 
estly increased FRETN (Fig. 3c). FRET between control pairs (SMO/ 
FZ2 or SMO/RABS) was =1.0% (Fig. 3c, d). In addition, CFP- and 
YFP-tagged SMO colocalized whereas SMO-CFP barely overlapped 
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with FZ2-YFP (Supplementary Fig. 4). Even in 82 cells stimulated 
with HH, in which SMO accumulated on the cell surface and over- 
lapped with FZ2, FRET between SMO/FZ2 remained low (Fig. 3c, d, 
and Supplementary Fig. 4). Furthermore, over fourfold changes in 
SMO signal intensity did not significantly affect FRET© (Supplemen- 
tary Fig. 5). 

In wing discs, FRET was high in both A and P compartments 
regardless of HH (Fig. 3e), whereas FRET© in A-compartment cells 
distant from the A/P boundary was relatively low but increased 
significantly in P-compartment cells and in A-compartment cells 
exposed to HH or lacking PTC (Fig. 3f; Supplementary Figs 6—7a). 
The high basal FRETN suggests that SMO forms a constitutive 
dimer/oligomer (dimer is used hereafter for simplicity), as is the case 
for the FZ family. Constitutive SMO dimerization was confirmed by 
immunoprecipitation assays (Supplementary Figs 8 and 9). SMO 
dimerization is likely to be mediated by SMON, which includes 
the N-terminal extracellular domain and transmembrane helices, 
because SMON-CFPN and SMON-YFPN colocalized and produced 
high basal FRET (Fig. 3c). 
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Figure 3 | Regulation of both conformation and proximity of SMO 
cytoplasmictails. a, b, j, Cartoons of SMO—CEPN/SMO-YEPN 

(a), SMO-CFP°/SMO-YEP* (b), and SMO-CFP"*YEFP dimers (j). The 
filled and open circles indicate CFP and YFP, respectively. c, d, g—i, k, m, FRET 
efficiency (y axis and numbers below bars) from the indicated CFP/YFP- 
tagged constructs in 82 cells treated with or without HH-conditioned 
medium, and with or without PKA inhibitor H89 or GSK3 inhibitor LiCl 
(mean + s.d., n= 10). SMO has three PKA sites mutated to Ala, whereas 
SMOS?!, SMOSP”? and SMO®? have PKA and CKI sites in one, two and 
three phosphorylation clusters converted to Asp, respectively (c, g, k; ref. 
15). SMON lacks a cytoplasmic tail (¢). RAI, RA12, RA123 and RA1234 have 
one, two, three and four Arg clusters mutated to Ala, respectively (i, m; see 
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Fig. 1d). AN1234 has four C-terminal acidic clusters mutated to Ala (m; see 
Fig. 4a). HH increased FRET® between SMO-—CEP©/SMO-YEP® (d) and 
decreased FRET!*° from SMO-CFP?YEP* (k), both of which were blocked 
by the SA mutations (g) or H89 but not by LiCl (h, k). Phospho-mimetic or 
RA mutations progressively increased basal FRETS (g, i) whereas they 
gradually decreased FRET!°° (k, m). e, f, |, FRET efficiency between 
SMO-CEPN/SMO-YEPN (e), SMO-CFP°/SMO-YFP* (f), or from 
SMO-CEP!3YFP© (I) expressed in wing discs (mean + s.d., n= 5). A, 
A-compartment cells away from the A/P boundary; P, P-compartment cells; 
A/P, A-compartment cells adjacent to the A/P boundary; A + HH, 
A-compartment cells expressing UAS-HH. 
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The low basal but high HH-induced FRET© suggests that the 
two SMO cytoplasmic tails within a dimer are separated from each 
other but HH signalling increases their proximity. To investigate 
whether increased proximity is accompanied by a conformational 
change, we generated a doubly tagged SMO (SMO-CFP"*YFP*) with 
CFP inserted into the third intracellular loop (L3) and YFP fused to 
the C terminus (Fig. 3j; Supplementary Fig. 3). SMO-CFP!*YFP© 
responded to HH and possessed signalling activity (Supplementary 
Figs 10, 11). In S2 cells, basal FRET from SMO-CEP!*YFP*® (referred 
to as FRET'*°) was 12.9 + 1.2% but dropped to 4.3 + 0.8% after HH 
treatment (Fig. 3k; Supplementary Fig. 11). In wing discs, FRET'*© 
was 24.3+2.1% in A-compartment cells distant from the A/P 
boundary, but dropped to 6.5 = 1.5% in A-compartment cells near 
the A/P boundary or to 7.3 + 1.6% in P-compartment cells (Fig. 3]; 
Supplementary Fig. 12). FRET'*© also reduced to 5.9 + 1.2% in 
A-compartment ptc mutant clones (Supplementary Fig. 7b). The 
high basal FRET’ is probably due to close proximity between the 
C terminus and L3 of the same SMO molecule (Supplementary Fig. 
13). These results suggest that SMO adopts a closed inactive con- 
formation with its C terminus in close proximity to L3, in quiescent 
cells. HH promotes SMO to adopt an open active conformation 
in which its Cterminus moves away from L3 but closer to the 
C terminus of its binding partner. 


Phosphorylation regulates SMO conformation 

To determine if conformational change and increased proximity of 
SMO cytoplasmic tails is regulated by phosphorylation, we mutated 
three PKA sites (Ser 667, Ser 687 and Ser 740) to Ala (SA) or sub- 
stituted them and adjacent CKI sites with Asp (SD123 or SD for 
simplicity)'°. The HH-induced increase in FRET© or decrease in 
FRET’°© was blocked by the SA mutation as well as a PKA inhibitor 
H89 (Fig. 3g, h, k), whereas the SD123 substitution resulted in high 
basal FRET® but low basal FRET!*© (Fig. 3g, k). In contrast, neither 
the basal nor the HH-induced FRET" was significantly affected by 
the SA or SD123 mutation (Fig. 3c), suggesting that constitutive 
SMO dimerization is not regulated by phosphorylation, but confor- 
mational change and increased proximity of SMO cytoplasmic tails 
are triggered by phosphorylation. Mutating multiple Arg clusters also 
resulted in high basal FRET© but low basal FRET'°© (Fig. 3i, m), 
suggesting that the Arg motifs keep SMO cytoplasmic tails in a closed 
inactive conformation. 

To assess direct physical interaction between SMO cytoplasmic 
tails and its regulation by phosphorylation, we applied the CytoTrap 
yeast two-hybrid assay (Methods Summary). Wild-type SMO cyto- 
plasmic tail (SMOC™') failed to self-associate, whereas phospho- 
mimetic SMO cytoplasmictail (SMOC*”) could self-associate and 
also interact weakly with SMO“ (Supplementary Fig. 14), indi- 
cating that phosphorylation of the SMO cytoplasmic tail may pro- 
mote self-association. 

Our previous study suggests that graded SMO activities are gov- 
erned by SMO phosphorylation levels’*. To determine if increasing 
SMO phosphorylation could induce gradual changes in SMO con- 
formation, we compared FRET® and FRET’ for several phosphor- 
ylation-mimetic forms of SMO. SMO*?!, SMOSP? and SMOS? 
contain Ser to Asp substitution in one, two and three phosphor- 
ylation clusters, respectively, and exhibit progressively higher levels 
of basal activity'’®. Interestingly, they also exhibited a progressive 
increase in basal FRET© (Fig. 3g) and gradual decrease in basal 
FRET'°© (Fig. 3k). Furthermore, FRET© progressively increased, 
whereas FRET’*© gradually decreased, when increasingly more Arg 
clusters were mutated (Fig. 3i, m). Thus, increasing SMO phosphor- 
ylation seems to induce progressive changes in SMO conformation 
by antagonizing the Arg clusters. SMO may adopt a series of 
conformational states determined by its phosphorylation levels. 
Alternatively, SMO may switch in equilibrium between two distinct 
conformational states: a closed inactive conformation and an open 
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active conformation; phosphorylation increases the probability for 
individual SMO to adopt the open active conformation. 


Arg clusters mediate intramolecular interaction 


To determine how Arg clusters keep SMO in a closed inactive 
conformation, we tested the possibility that they might be involved 
in intramolecular interactions. A glutathione S-transferase (GST) 
fusion protein (GST-SMO656-755) that contains the SMO region 
between amino acids 656—755 (referred to hereafter as SAID for SMO 
auto-inhibitory domain) was tested for interaction with a set of 
C-terminal fragments, and a minimal SAID interacting fragment 
(NT860) was identified that contains the C-terminal region between 
amino acids 860-1035 (Supplementary Fig. 15). SAID—NT860 inter- 
action was diminished by PKA/CKI phosphorylation as well as RA or 
SD123 mutations (Fig. 4b, c), and the binding affinity gradually 
decreased with more phosphorylation or Arg clusters mutated 
(Fig. 4d). The importance of Arg clusters in the SAID—NT860 inter- 
action indicates that the association may be mediated by electrostatic 
interactions. Indeed, mutating several acidic clusters in the 
C-terminal half of NT860 gradually diminished the SAID-NT860 
interaction (Fig. 4a, e). 

The electrostatic interaction between NT860 and SAID may result 
in a folding back of the SMO cytoplasmic tail to form a closed con- 
formation (Fig. 4f). Consistently, mutating the acidic clusters 
(SMOAN1234) resulted in decreased basal FRET'°© (Fig. 3m). 
However, unlike RA mutations, which not only caused conforma- 
tional change but also promoted SMO cell surface accumulation, 
SMOAN1234 exhibited little if any cell surface expression and did 
not exhibit high levels of constitutive activity (Fig. 2n), indicating 
that both cell surface accumulation and conformational change may 
be critical for SMO activity. 


Clustering of the SMO cytoplasmic tail activates the HH pathway 
To assess the biological significance of SMO dimerization, we ana- 
lysed two SMO mutants with point mutations in the N-terminal 
extracellular domain: SMO? is encoded by a hypomorphic allele 
such that Cys 90 is substituted to Ser; and SMO""' is encoded by a 
strong allele that changes Cys 155 to Tyr (ref. 22). Both mutations 
reduced basal as well as HH-induced FRETN and FRETS, with 
SMO'"' exhibiting more severe defects (Supplementary Fig. 16). 
Immunoprecipitation assays indicated that SMO’? and SMOF!! 
failed to dimerize with SMO’ (Supplementary Fig. 17a). Unlike 
SMOW™, neither SMO? nor SMO!!! was phosphorylated in res- 
ponse to HH (Supplementary Fig. 17b). In addition, both SMO!” 
and SMO""' lost HH-induced activity (Fig. 5a). 

If loss of SMO activity was due to compromised dimerization, 
restoring dimerization to these mutants should rescue their activities. 
To test this, we developed an inducible dimerization system by taking 
advantage of the observation that the mammalian receptor tyrosine 
kinase EphB2 forms a hetero-tetramer with its ligand ephrin B2 (EB2; 
also known as Efnb2) to trigger bidirectional signalling”. Accord- 
ingly, we constructed EB2-SMO chimaeric proteins in which the 
extracellular domain of EB2 was inserted into the SMO N-terminal 
extracellular domain (Supplementary Fig. 3). When expressed in cl-8 
cells, EB2-SMO'*? and EB2-SMO!"! failed to be activated by HH- 
conditioned medium; however, they were activated when cells were 
exposed to the soluble pre-clustered EphB2 extracellular domain, 
EphB2-Fc (Fig. 5a). In addition, FRET© between mutant pairs of 
EB2-SMO-CFP‘/EB2-SMO-YFP“ increased significantly in res- 
ponse to EphB2-Fc but not HH (Fig. 5b). 

To determine if dimerization of the SMO cytoplasmic tail suffices 
to activate the HH pathway, we constructed EB2-SMO cytoplasmic- 
tail chimaeric proteins in which the intracellular domain of EB2 
was replaced by the wild-type (EB2-SMOC™'), phosphorylation- 
deficient (EB2-SMOC™), or phosphorylation-mimetic (EB2- 
SMOC®”) SMO cytoplasmic tail (Fig. 5e, and Supplementary Fig. 
3). In both cl-8 cells and wing discs, EB2-SMOC™?! exhibited low 
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basal activity but was markedly stimulated by EphB2 (Fig. 5c, f). 
Furthermore, EB2-SMOC™" activated the HH pathway indepen- 
dent of endogenous SMO (Supplementary Fig. 18). EB2-SMOC" 
also induced FU phosphorylation in response to EphB2-Fc (Supple- 
mentary Fig. 19a). In addition, FRET between EB2-SMOC™ '—CEP%/ 
EB2-SMOC™'—YFP* increased significantly in response to EphB2- 
Fc (Fig. 5d). 

EB2-SMOC™ did not significantly activate any HH target genes 
even after clustering by EphB2 (Fig. 5c, f). PKA-site mutation may lock 
the cytoplasmic tails in a closed inactive conformation that prevents 
their association. Consistent with this model, FRET between EB2— 
SMOC**-CFP‘/EB2-SMOC*"-YFP© remained low after EphB2-Fc 
treatment (Fig. 5d). EphB2-Fc treatment induced phosphorylation 
of EB2-SMOC, which was abolished by the SA mutation and H89 
(Supplementary Fig. 19b), suggesting that EphB2/EB2-induced clus- 
tering of SMO cytoplasmic tails promoted their phosphorylation and 
close proximity, leading to HH pathway activation. 

EB2-SMOC°” exhibited high basal activity, yet its activity was 
further enhanced by EphB2 (Fig. 5c, f). In addition, FRET between 
EB2-SMOC°?-CFP‘°/EB2-SMOC°”-YFP® increased after EphB2- 
Fc treatment (Fig. 5d). Thus, even though ‘phosphorylated’ SMO 
cytoplasmic tails may adopt an open conformation that allows them 
to interact more avidly, as suggested by their high basal FRET© 
(Fig. 5d), EphB2/EB2-induced clustering further increased their 
proximity, leading to enhanced pathway activation. These results 
further underscore the importance of close proximity between 
SMO cytoplasmic tails for pathway activation. 


Regulation of mammalian SMO 

In response to SHH and PTC inactivation, mammalian SMO (Smo) 
translocates to primary cilia, which is thought to trigger pathway 
activation **. To determine if SHH may also regulate Smo 
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conformation, we constructed C- or N-terminally CFP/YFP-tagged 
Smo or a doubly tagged Smo with CFP inserted into the second 
intracellular loop (L2) and YFP fused to the C terminus (Supple- 
mentary Fig. 20a). All tagged forms exhibited activities similar to that 
of the untagged wild-type form (Supplementary Fig. 20b). Like 
Drosophila SMO, Smo also exhibited high basal FRET’ and low basal 
FRET; however, SHH as well as an oncogenic mutation (A1)”” 
induced significant increases in FRET© (Supplementary Fig. 20c, d). 
In addition, both SHH and the Al mutation reduced FRET from 
Smo-CFP'*YFP© (FRET'*°; Supplementary Fig. 20e), indicating 
that Smo may also exist as a constitutive dimer and that SHH induces 
a conformational change, leading to increased proximity of Smo 
cytoplasmic tails. Interestingly, induced clustering of full-length 
Smo through the ephrin B2/EphB2 system also triggered pathway 
activation; however, unlike Drosophila SMO, clustering of Smo cyto- 
plasmic tails failed to activate the pathway (Supplementary Fig. 21). 
It is possible that other intracellular domains such as L3 may be 
essential for inducing the active conformation of Smo and/or recruit- 
ing the intracellular signalling complex because point mutations in 
L3 inactivate Smo”. 

Vertebrate SMO proteins contain multiple conserved clusters of 
basic residues in their cytoplasmic tails, including a long stretch of 
Arg/Lys residues in the central region (Supplementary Fig. 22a). 
Interestingly, mutating this long stretch of Arg/Lys residues to Ala 
resulted in constitutive activity of Smo, increased FRET© and 
decreased FRET'*© (Supplementary Fig. 22b-e), indicating that 
Smo may employ an Arg/Lys cluster to regulate its conformation 
and activity. 


Discussion 


The prevalent view regarding SMO regulation is that SMO is acti- 
vated as a result of subcellular compartmentation'*'>**°*?, Here we 
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Figure 4 | The Arg clusters mediate intramolecular electrostatic 
interaction. a, Diagram of SMO with the SAID indicated by a grey box, and 
NT860 with indicated substitutions. b, GST pull-down assay using 
GST-SAID and 82 cell extracts expressing the indicated Flag-tagged SMO 
C-terminal fragments. c, d, GST pull-down experiments using wild-type or 
the indicated mutant GST-SAID and S82 cell extracts expressing Flag-tagged 
NT860 (c) or in vitro translated *°S-labelled NT860 (d). e, Autoradiography 
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(upper panel) and quantification (lower panel) of a GST pull-down assay 
using GST-SAID and in vitro translated *°S-labelled wild-type (WT) or 
mutant (AN) NT860. I, input; P, pulled-down protein. The binding 
affinity was indicated by the ratio of pulled-down protein (pull down) to 
input. f, A model for regulating SMO conformation by multiple Arg clusters 
and HH-induced phosphorylation; see text for detail. 
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provide substantial evidence that SMO activity is also regulated by a 
conformational switch. In particular, we identified an autoinhibitory 
domain (SAID) in the Drosophila SMO cytoplasmic tail, containing 
multiple Arg clusters that keep SMO in a closed inactive conforma- 
tion through intracellular electrostatic interaction (Fig. 4f). HH- 
induced phosphorylation disrupts such interaction and triggers a 
conformational switch and increased proximity of SMO cytoplas- 
mic tails, which may further promote recruitment and interaction 
of intracellular signalling complexes*’~’. Our results also indicate 
that the Arg clusters may promote endocytosis and degradation of 
SMO, whereas multiple phosphorylation events neutralize the nega- 
tive effect of the Arg clusters either by inhibiting endocytosis and/or 
promoting recycling of SMO. 

A striking feature of the SAID domain is that it contains multiple 
regulatory modules each of which consists of an Arg cluster linked to 
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Figure 5 | Clustering of SMO cytoplasmic tails triggers HH pathway 
activation. a, c, The ptc-luc reporter assay in cl-8 cells transfected with the 
indicated SMO expression constructs and treated with or without HH- 
conditioned medium or EphB2-Fc. Error bars, 1 s.d. (triplicate wells). 

b, d, FRET between wild-type or mutant pairs of EB2-SMO-CFP®/ 
EB2-SMO-YEP* (b) or EB2-SMOC-CEP‘/EB2-SMOC-YEP° 

(d) expressed in S2 cells treated with or without HH-conditioned medium or 
EphB2-Fc (mean + s.d., n = 10). e, Cartoon of the EB2-SMOC-EphB2 
complex. f, Wing discs expressing the indicated SMO constructs with or 
without EphB2Z under the control of the MS1096 Gal4 driver were 
immunostained to show the expression of dpp-lacZ (dppZ), PTC and EN. Of 
note, EB2-SMOC®” exhibited higher basal activity than EB2-SMOC™™ 
because it induced higher levels of ectopic dppZ and also induced ectopic 
albeit low levels of ptc. When coexpressed with EphB2Z, both EB2-SMOC™" 
and EB2-SMOC®” ectopically activated high levels of ptc and low levels of 
en. In contrast, EB2-SMOC** failed to activate any HH target genes. 
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a phosphorylation cluster. The pairing of positive and negative reg- 
ulatory elements may offer precise regulation, because phosphoryla- 
tion at a given cluster may only neutralize adjacent negative 
element(s), leading to an incremental change in SMO activity. We 
propose that increasing phosphorylation gradually neutralizes the 
negative effect of multiple Arg clusters, leading to a progressive 
increase in SMO cell surface expression and activity (Supplemen- 
tary Fig. 23). Thus, by employing multiple Arg clusters as inhibitory 
elements that are counteracted by differential phosphorylation, 
SMO acts as a rheostat to translate graded HH signals into distinct 
responses. 


METHODS SUMMARY 


smo? and ptc are strong alleles of smo and ptc, respectively (http://flybase. 
bio.indiana.edu/). MS1096, ptc-Gal4, dpp-lacZ, UAS-smo-CFP°/UAS-smo-— 
YFP and their mutant derivatives have been described'*. Drosophila smo and 
mouse Smo constructs were generated using the pUAST and pGE vectors, 
respectively. Amino acid substitutions were generated using PCR-based muta- 
genesis. Fly transformants were generated by standard P-element mediated 
transformation. Multiple independent transgenic lines were tested for activity. 
Immunostaining was carried out as described**. $2 and cl-8 cells were cultured as 
described****. Treatment of transfected cells with HHN-conditioned medium 
and ptc-luc reporter assays were carried out as described’. Cell surface staining 
was carried out as described'*. NIH-3T3 cells were cultured in DMEM medium. 
Mammalian reporter assays were performed essentially as described”’. For GST 
pull-down assays, S2 cell lysates or reticulocytes with in vitro translated *°S- 
labelled proteins were incubated with GST fusion proteins absorbed on glu- 
athione beads. Proteins bound to the beads were separated on SDS—PAGE, 
followed by western blot or autoradiography. Immunoprecipitation and western 
blot analysis were carried out using standard protocols. Yeast two-hybrid assays 
were carried out using Stratagene’s CytoTrap system according to the manufac- 
urer’s instructions. For FRET analysis, a Zeiss LSM510 confocal microscope was 
used. CFP was excited at 458nm wavelength and the emission was collected 
hrough a BP 480-520 nm filter. YFP was excited at 514nm wavelength and 
he emission was collected through a BP 535-590 nm filter. CFP signal was 
obtained once before (BP) and once after (AP) photobleaching YFP using the 
full power of the 514nm laser line for 1-2 min at the top half of each cell or 
selected disc area, leaving the bottom as an internal control. The intensity 
change of CFP was analysed using the Metamorph software (Universal 
Imaging). The efficiency of FRET was calculated using the formula: FRET% = 
[(CFPap — CFPgp)/CFPap] X 100. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Constructs and transgenes. SMO C-terminal deletion constructs were gener- 
ated by PCR amplification of the corresponding coding sequences, followed by 
subcloning into a pUAST vector containing CFP coding sequence so that the CFP 
was fused to the C terminus of each SMO deletion mutant. For Ala-scan muta- 
genesis and F11, 1A3, RA, SA and SD mutations, substitutions were generated by 
PCR-based site-directed mutagenesis. For UAS-Smo-CFP? YFP*, the CFP cod- 
ing sequence was inserted between SMO amino acids 451-452, and YFP was 
fused to the Cterminus. To construct mutant forms of SMO-CFP‘/YFP© or 
SMO-CFP'*YFP®, the corresponding mutant sequences were swapped by using 
a unique Spel site in the seventh transmembrane domain. To construct wild-type 
or mutant forms of SMO-CFP‘/YFPN, CEP/YEP was inserted in frame into a 
unique Sfil site near the N-terminal region. To construct EB2-SMOC chimae- 
rical proteins, the intracellular domain of EB2 was replaced by wild-type or 
mutant forms of the SMO cytoplasmic tail (amino acids 556-1035). To construct 
EB2-SMOW", EB2-SMO!!! and EB2-SMO“"!, the extracellular domain of EB2 
was fused to SMO sequence encoding amino acids 33-1035. EphB2Z contains a 
full-length EphB2 fused to B-galactosidase to facilitate oligomerization”’. To 
generate GST-SMO fusion constructs, smo complementary DNA fragments 
encoding amino acids 656—755 with wild-type sequence or point mutations were 
amplified by PCR and inserted between Nofl and EcoRI sites in the pGEX4T-2 
vector. To generate Flag-tagged SMO C-terminal fragments such as SMO- 
NT860, the corresponding cDNA fragments were amplified by PCR and sub- 
cloned to a pUAST-Flag vector. To construct Smo—CFP°/YFP©, CFP/YFP was 
fused in frame to the Smo C terminus. To construct Smo—CFP/YEPN, CFP/YFP 
was inserted in frame after amino acid 31. To construct Smo—CEP’’YFP*, the 
CFP coding sequence was inserted between Smo residues 355 and 356, and YFP 
was fused in frame to the C terminus. For EB2—Smo, the extracellular domain of 
EB2 was fused N-terminally to the full-length Smo. For EB2—SmoC, the intra- 
cellular domain of EB2 was replaced by the Smo cytoplasmic tail (amino acids 
544-793). Multiple independent transgenic lines were tested for each construct. 
MS1096, ptc-gal4, dpp-lacZ, UAS-smo-CFP©/YFP© and their mutant derivative 
have been described">. 

Cell culture, immunoprecipitation, GST pull-down, western blot, immuno- 
staining and luciferase reporter assay. S2 and cl-8 cells were cultured as 
described****. Transfection was carried out using the Calcium Phosphate 
Transfection Kit (Speciality Media). HH-condition medium treatment was car- 
ried out as described. Immunoprecipitation and western blot analysis were 
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carried out using standard protocols. For cell surface staining, transfected cells 
were fixed with 4% paraformaldehyde and incubated with primary antibody in 
PBS for 30 min at room temperature, followed by incubation with secondary 
antibody in PBT. For GST pull-down assays, GST fusion proteins absorbed on 
glutathione beads were washed three times with ice-cold PBS containing 1% 
NP40. Cell lysates from S2 cells expressing tagged SMO C-terminal fragments or 
reticulocytes with in vitro translated *°S-labelled SMO C-terminal fragments 
were then added and the mixtures were incubated at 4 °C for 1 h with occasional 
mixing. Proteins bound to the beads were washed five times with PBS plus 1% 
NP40 before separation on SDS—PAGE, followed by western blot or autoradio- 
graphy. For EphB2-Fc treatment, EphB2-Fc chimaera (R&D Systems) and goat 
anti-human IgG Fe (Jackson Immunoresearch Labs) were mixed for 4h at 4 °C 
before being added into cultured cells. NIH-3T3 cells were cultured in DMEM 
containing 10% bovine calf serum and antibiotics penicillin/streptomycin at 5% 
CO, in a humidified incubator. Transfection of NIH-3T3 cells was carried out 
using FuGENE6 (Roche). Briefly, after transfection for 2 days, cell culture 
medium was changed to DMEM with 0.5% bovine calf serum with or without 
recombinant mouse SHHN (R&D Systems). Mammalian reporter assays were 
performed essentially as described’’. Immunostaining of imaginal discs was 
carried out as described”. Antibodies used in this study were: rabbit anti-BGal 
(Cappel), mouse anti-PTC (from I. Guerrero), mouse anti-EN (DSHB), rabbit 
and mouse anti-Flag (Sigma), mouse anti-S MON (DSHB) and mouse anti-Myc 
(Santa Cruz). 

FRET analysis using confocal microscopy. For FRET analysis of cultured cells, 
CFP- and YFP-tagged constructs were transfected into S2 cells together with an 
ub-Gal4 expression vector’. Transfected cells were treated with or without HH- 
conditioned medium. For maximal HH signalling strength, a UAS-HH expres- 
sion construct was also included in the transfection**. Cells were washed with 
PBS, fixed with 4% formaldehyde for 20 min, and mounted on slides in 80% 
glycerol. For FRET analysis of wing discs, smo transgenes were expressed with 
MS1096 (for analysis of A- or P-compartment cells) or ptc-Gal4 (for analysis of 
A-compartment cells near the A/P boundary). Late third instar wing discs were 
fixed with 4% formaldehyde and mounted on slides in 80% glycerol. Fluorescence 
signals were acquired with the X 100 objective of a Zeiss LSM510 confocal micro- 
scope. Each data set was based on 10-15 individual cells. In each cell, four to five 
regions of interest in photobleached area were selected for analysis. 

Yeast two-hybrid assay. The prey and bait plasmids were constructed using the 
C-terminal fragment of SMO (amino acids 641-1035). 
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High-resolution structure prediction and 
the crystallographic phase problem 


Bin Qian'*, Srivatsan Raman’*, Rhiju Das’*, Philip Bradley’, Airlie J. McCoy’, Randy J. Read” & David Baker’ 


The energy-based refinement of low-resolution protein structure models to atomic-level accuracy is a major challenge for 
computational structural biology. Here we describe a new approach to refining protein structure models that focuses 
sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom 
force field. In applications to models produced using nuclear magnetic resonance data and to comparative models based on 
distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the 
backbone conformations and the placement of core side chains. Furthermore, the resulting models satisfy a particularly 
stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular 
replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach 
the high accuracy required for molecular replacement without any experimental phase information and in the absence of 
templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of 
high-resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing 
crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate previous models. 


High-resolution prediction of protein structures from their amino 
acid sequences and the refinement of low-resolution protein struc- 
ture models to produce more accurate structures are long-standing 
challenges in computational structural biology’. The refinement 
problem has become particularly important in recent years, as the 
continued increase in the number of experimentally determined pro- 
tein structures, together with the explosion of genome sequence 
information, has made it possible to produce comparative models 
of a large number of protein structures with wide utility’. Ideally, 
these models would consistently approach the resolution offered by 
X-ray crystallography, enabling precise drug design and a deeper 
understanding of catalysis and binding. Accurate high-resolution 
models can, in principle, be achieved by searching for the lowest 
energy structure given the sequence of the protein. However, despite 
progress’, the large number of degrees of freedom in a protein chain 
and the ruggedness of the energy landscape produced by strong 
atomic repulsion at short distances greatly complicate this search 
for sequences lacking close homologues of known structure. 

An important application for predicted structures is to help solve 
the X-ray crystallographic phase problem*°. Converting X-ray dif- 
fraction data into electron density maps of proteins requires the 
inference of phases associated with each diffraction peak. Although 
phase estimates can be obtained through the preparation of heavy 
atom derivatives, the problem can be solved without additional 
experimental information by the technique of molecular replace- 
ment*? given a structure model that has high structural similarity 
(better than 1.5 A root-mean-squared (r.m.s.) deviation) to the crys- 
tallized protein over a large fraction of the molecule. As an example of 
the stringency of this condition, models of protein structures derived 
from nuclear magnetic resonance (NMR) data typically do not give 
good molecular replacement models for crystallographic data on the 
same proteins®. Perhaps the most successful approach to molecular 
replacement is the use of previous crystal structures of highly 


sequence-similar (>40%) templates as search models. In cases of 
lower sequence similarity, structure prediction tools can frequently 
help build comparative models that give better molecular replace- 
ment solutions; however, the success rate drops rapidly as the tem- 
plate sequence identity falls below 30%*”. In cases where structurally 
similar experimental models are not available, ab initio phasing tech- 
niques have had some success for targets with simple folds of high 
symmetry’* or with new structures that have been rationally designed 
from first principles’, but ab initio phasing of diffraction data for 
natural globular proteins remains an unsolved problem. 

In this study, we present a new energy-based rebuilding-and- 
refinement method that consistently improves models derived from 
NMR, from sequence-distant templates, and from de novo folding 
methods. The final models include high-resolution features not pre- 
sent in the starting models, including the packing of core side chains. 
Bringing together these results from all-atom structure prediction 
with state-of-the-art algorithms for molecular replacement and 
automated rebuilding'’®’, we show that distant-template-based 
and de novo models can reach the accuracy required to solve the 
X-ray crystallographic phase problem. 


Targeted rebuilding-and-refinement protocol 


We have developed a new approach for refining protein models that 
combines the targeting of aggressive sampling to regions most likely 
to be in error with powerful global optimization techniques. The new 
protocol is outlined in Fig. la. The first step of this protocol is the 
energy-based optimization of an input ensemble of models using 
the previously described Rosetta all-atom refinement method. This 
method combines Monte Carlo minimization with side-chain remo- 
delling to relieve inter-atomic clashes and to optimize side-chain 
packing and hydrogen bonding, as encoded by an all-atom force 
field’**. Briefly, in each Monte Carlo move, a random perturbation 
to the protein backbone torsion angles is followed by discrete 
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optimization of the side-chain conformations'*’°, which allows effi- 


cient crossing of side-chain torsional barriers. Then, quasi-Newton 
optimization of the side-chain and backbone torsion angles is carried 
out before the decision on whether to accept the move. Because of the 
final minimization, each point on the landscape is mapped to the 
closest local minimum, flattening energy barriers'®. Although making 
it possible to recognize near-native predictions based on their low 
energies’’'*, this all-atom refinement alone does not consistently 
produce significant improvements in model quality (Supplemen- 
tary Fig. 1). 

The second step in the new protocol is the identification of regions 
of variation in the ensemble of refined models. We have found a 
marked correlation between the extent of variation in the coordinates 
of a residue in the refined structures and the deviation of the coordi- 
nates of the residue in the refined models from the native structure. 
An example is shown in Fig. 1b, c: positions exhibiting small variance 
across the models are usually quite close to the correct structure, 
whereas positions for which the variance is large often deviate con- 
siderably from the native structure. This correlation arises from the 
relatively short range of the force field and the energy gap between the 
native structure and the models: because the energy of the entire 
system is roughly equal to the sum of its parts, for most portions 
of the protein, the correct conformation will be lower in energy than 
non-native conformations. Regions of the protein that can access the 
native conformation are likely to converge on this conformation and 
thus exhibit less variation, whereas locally incorrect conformations 
are likely to be spread throughout the landscape and exhibit more 
variation. We observe this correlation for many different proteins in 
both the cartesian coordinates and the internal torsion angles; a 
related principle has recently been used in the Pcons method for 
assessing protein models’”. 

The third step in the new protocol targets aggressive sampling to 
the regions most likely to be in error. A fragment-based segment 
rebuilding method (see Supplementary Material) is used to rebuild 
completely regions of models with relatively high variation in the 
model population. Because the precise regions that are incorrect 
cannot be identified unambiguously, we carry out many independent 
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calculations in which different segments in the higher variation 
regions are randomly selected for complete rebuilding. The partially 
rebuilt models are then subjected to the Rosetta all-atom refinement 
protocol described above'*". In the segment rebuilding process, side 
chains are initially represented as soft interaction centres and the 
connectivity of the chain is temporarily broken, thus permitting 
the traversal of much larger barriers than those crossed by all-atom 
refinement alone. 

As indicated in Fig. 1a, if the lowest energy refined structures have 
not converged, the rebuilding-and-refinement protocol is applied 
iteratively using a selection process inspired by natural evolution to 
guide convergence on the global minimum. At each iteration, a sub- 
set of models that are low in energy yet structurally diverse is chosen 
to seed the next round; the regions to be rebuilt are determined on the 
basis of the backbone variation in the selected population. Bringing 
together ideas from tabu search’* and conformational space anneal- 
ing'’, the selection process alternates between the propagation of a 
structurally diverse population into the next round (diversification) 
and focusing in on the lowest energy regions of the energy landscape 
explored thus far (intensification). The lowest energy models after 
ten iterations are selected as the final predictions. As illustrated in 
Fig. 1d, models with progressively lower energies and more native- 
like structures can be obtained with increasing number of iterations; 
results on a number of refinement problems are summarized in 
Supplementary Fig. 2. 


Improving NMR models 


As a first test of the new rebuilding-and-refinement method, we 
sought to improve the accuracy of protein structure models derived 
from moderate-resolution NMR experiments. NMR is an important 
method for determining structures of proteins at atomic resolution 
that has the advantage of not requiring crystals. In some cases, how- 
ever, NMR models can contain errors due to either insufficient data 
or ambiguities in interpretation of the input NMR spectra”. We 
applied the method outlined in Fig. la to ten ensembles of NMR 
models deposited in the Protein Data Bank (PDB) for which inde- 
pendently determined high-resolution X-ray crystal structures 
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Figure 1| Overview of the rebuilding-and-refinement method. a, Schematic 
diagram of the rebuilding-and-refinement method applied to structures 
from NMR, from comparative modelling (CM) and from de novo (DN) 
modelling approaches. b, Strong correlation between the per-residue 
backbone conformation variation in the model ensemble and the deviation 
from the native structure for target T0199 from the sixth critical assessment 
of structure prediction (CASP6). ¢, Superposition of the native structure of 
CASP6 target T0199 with 50 low-energy all-atom refined models. The native 
structure backbone is shown as a thick line, and the models are shown as 
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thinner lines. Residues in the native structure are coloured by the average 
per-residue Cx r.m.s. deviation to the native from 4.5 A (red) to 0.5 A (blue). 
d, Iterative rebuilding and refinement yields low-energy native-like models. 
The energy and the Cx r.m.s. deviation of models generated during three 
iterations of the loop-relax protocol are displayed for iteration 1 (green), 
iteration 4 (red) and iteration 7 (black). The Rosetta all-atom energy 
includes the enthalpy plus the solvation contribution to the entropy but not 
the configurational entropy. 


©2007 Nature Publishing Group 


NATURE|Vol 450|8 November 2007 


provide tests of model accuracy*’*. Regions with high variation in 
initial all-atom refined ensembles were stochastically rebuilt as well as 
regions assessed as poorly packed (see Methods) to allow for possible 
over-convergence of the initial NMR ensemble in regions with incor- 
rect constraints. 

In eight of the ten cases, the lowest energy refined model was closer 
to the crystal structure than any member of the starting NMR 
ensemble (typically 20 members) in terms of backbone agreement, 
as assessed by GDT-HA (geometric distance test (high accuracy)”’). 
Comparison of the best of five lowest energy refined models to the 
NMR ensemble indicates improvement in backbone accuracy and 
core packing in all cases (see Table 1 and Supplementary Figs 3 and 
4). In addition, the quality of the lowest energy models was consis- 
tently better than the starting NMR models in terms of clash 
score, number of rotamer outliers and number of backbone 
(Ramachandran) outliers, as assessed by the MolProbity server 
(Supplementary Table 2)**. Four examples of this energy-based 
structural improvement are shown in Fig. 2a—d. It should be noted 
that no NMR data were included in these rebuilding-and-refinement 
tests; judicious use of experimental NMR information to focus all- 
atom refinement (for example, using inferential structure determina- 
tion”) could yield still better results. 

As noted above, NMR structures often do not give good molecular 
replacement models for crystallographic data®, and we hypothesized 
that the all-atom refined models would yield better solutions. Indeed, 
we found such improvement in molecular replacement scores for all 
eight cases in which diffraction data were publicly available (Table 1), 
using the sensitive and widely used Phaser software'®. Furthermore, 


ARTICLES 


using phases from the molecular replacement trial with the highest 
translation function Z-score, electron density maps were generated 
and in seven of the eight cases the widely used ARP/wARP"! or 
RESOLVE” automatic map tracing programs could build the majo- 
rity of the residues with no human intervention (Table 1). An 
example of the improvement in density is shown in Fig. 3a, b. 
These results suggest that all-atom rebuilding and refinement may 
be a powerful supplement to existing strategies of trial-and-error 
trimming of NMR ensembles to improve molecular replacement 
solutions for crystallographic data’. 


Improved blind predictions based on templates 


Asa further challenging test, we used the new energy-based rebuilding- 
and-refinement method to make blind structure predictions for 26 
proteins with lengths less than 200 residues that had distant homo- 
logues (sequence identity lower than 30%) with known structure 
during the seventh Critical Assessment of Techniques for Protein 
Structure Prediction (CASP7). Ensembles of starting models based 
on different alignments to one or more of these distant homologues 
were generated as described in the Supplementary Information, and 
the rebuilding-and-refinement protocol was carried out with several 
rounds of iteration to explore more broadly conformational space 
(Fig. 1a). Five representative low-energy structures from the final 
population were submitted to the CASP organizers. For 18 of the 26 
cases, at least one of these 5 models was closer to the correct structure 
than the closest homologous structure in the PDB, as assessed by the 
GDT-HA score”. Marked improvement was observed in seven cases, 
with a 10-30% increase in this measure of model quality (see Table 1). 


Table 1| Improvement of model accuracy and molecular replacement by a rebuilding and refinement protocol 


X-ray structure Starting model* Length (n)+ = Sequence identity GDT-HA8 TFZ|| in molecular replacement = Auto-traced residues (backbone, 
to best template side chain)4, 
(%)t 
Best template Refined model Besttemplate Refined model Besttemplate Refined model 
NMR 1hb6 2abd 86 /A 0.58 0.79 Al 13 2,0 80, 80 
who 1bmw 94 /A 0.59 0.68 5.7 8.3 25,12 47, 44 
1gnu 1kot 119 /A 0.64 0.73 6.6 10.6 62, 53 82,78 
al9 lab7 89 (2) /A 0.63 0.78 3.7 8.8 31,20 48,37 
45 12.5 4,0 44,35 
1fvk 1a24 189 (2) /A 0.49 0.69 3.4 6.9 66, 50 97,91 
43 12.4 55, 43 85, 68 
mzl lafh 93 /A 0.60 0.66 46 5.1 36, 29 58, 44 
tvg 1lxpw 143 /A 0.63 0.74 43 6.7 5,6 103, 86 
2snm 2sob 97 /A 0.45 0.48 3.8 4.8 17,16 43, 37 
lagr lezy 129 JA 0.49 0.76 N/A# N/A# 
abq lawo 56 JA 0.58 0.83 N/A N/Ax 
CM 2hhz (T0331) 1ty9A 149 14.5 0.49 0.58 5.4 8.8 28, 24 68, 63 
2hr2 (T0368) 2c21C 158 (6) 14.8 0.57 0.67 6.0 5.4 37, 37 20, 14 
2hq7 (T0380) 2fhgA 145 (2) 25.4 0.58 0.69 44 6.6 47,23 92, 83 
46 14.2 30, 17 60, 59 
2ibO (T0385) 1jgcB 170 (2) 7.8 0.62 0.69 5.1 79 63, 37 56, 56 
5.8 15:5 50, 2 52, 52 
2hi0 (T0329_D2) 1rglA 92 (2) 8.8 0.52 0.67 N/A# N/A# 
2hcf (T0330_D2) 1lvhB 75 14.1 0.51 0.65 N/A# N/A# 
2hi6 (T0357)** laco 132 8.4 0.45 0.52 N/A** N/A** 
DN 2hh6 (T0283) 2b2j 112 3.6 0.22 0.64 5.4 9.0 26, 12 112, 112 


* PDB accession numbers for the closest previously known template (comparative modelling (CM) and de novo modelling (DN)) or for the NMR structure. 


+ Length of sequence in crystal structure (number of monomers in asymmetric unit, n). 


+t Number of sequence-identical residues across regions structurally aligned within 4 A* divided by the length of the shorter sequence. 


§ Fraction of residues in model superimposable on crystal structure with high accuracy (see Supplementary Information and ref. 23). This value is the average of four numbers: the numbers of 
residues aligned between model and experimental structure within 0.5A, within 1A, within 2 A and within 4 A. For the CM cases, GDT-HA was determined for the residues structurally aligned 
between the native structure and the closest template. For the NMR cases, the GDT-HA comparison presented for the best template is between the first member of the deposited NMR structural 


ensemble and the crystal structure. 


|| Z-score of Phaser log-likelihood translation function for molecular replacement solution. For CM and DN cases, molecular replacement for the best template was carried out using a mixed-model 
based on the best possible structural alignment between the native structure and template structure*; no such alignment was carried out for the refined model, however. The TFZ scores for the next 
best model submitted by all other CASP7 predictors were 5.4 (T0331), 6.0 (T0368), 4.4 (T0380), 5.1 (T0385) and 6.9 (T0283). For NMR cases, the presented results are from molecular 

replacement with the full deposited NMR ensemble and from each of the lowest energy 25 refined models (see also Supplementary Table 1). In the NMR cases, the best-TFZ structure from the 
deposited ensemble (see Supplementary Table 1) typically gave slightly worse results in subsequent automatic tracing than using the full ensemble, as expected®. In cases with multiple monomers 
present in the asymmetric unit, Z-scores for each monomer are presented, except for T0368, for which decreasing TFZ scores for molecular replacement of additional monomers after the first one 


indicated the solutions to be ambiguous. 


4] Number of automatically traced residues starting with molecular replacement phases given by Phaser that match the deposited crystal structure within 2 A. Inall cases, tracing and refinement was 


carried ou 
# Predicted model is for 
yr Structure factors not deposited in the PDB. 
** Solved by NMR spectroscopy. 


with the ARP/wARP" and RESOLVE” programs, with the better results from the two programs presented. 
he smaller of two domains present in the crystal structure and is thus not sufficient for molecular replacement. 
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This is a particularly notable result because improving on the best 
template structure has been a long standing challenge for comparative 
modelling—owing to the high dimensionality of conformational 
space, there are many more ways to degrade a reasonably accurate 
model than to improve it. Superpositions of the closest homologous 
structure, the submitted refined models and the native structure for 
cases with the greatest improvement are shown in Fig. 2e-h. The 
improvement in the refined structures is evident even in core second- 
ary structural elements. 

Out of the seven high-resolution predictions, there were four tar- 
gets for which diffraction data were available and the modelled 
sequence constituted the entire crystallized construct, enabling tests 
of molecular replacement. In each of these cases, we found that the 
best previous templates in the PDB failed to produce clear-cut 
molecular replacement solutions (Phaser Z-scores greater than 7), 
even after using knowledge of structurally alignable regions and 
a side-chain truncation approach to trim back the search models 
to their most accurate atoms*. Other template-based models sub- 
mitted to CASP7, based on methods that typically did not use 
aggressive all-atom refinement, gave similarly low molecular replace- 
ment scores (Table 1). For three of the four cases, however, the 
refined models that we submitted for CASP7 gave significantly better 
molecular replacement solutions than the best template (Table 1). 
For these targets, the maps produced by combining phases from 
the blindly predicted model with the experimental diffraction ampli- 
tudes were of sufficient quality to permit the automatic chain-tracing 
program RESOLVE” to build a large fraction of each structure 
with high accuracy (Table 1). An example of the marked improve- 
ment in electron density on using the refined models is shown in 
Fig. 3c, d. 


Ab initio phasing by ab initio modelling 

To the best of our knowledge, a de novo structure prediction for a 
natural protein with an asymmetric, globular fold has never been 
used successfully for molecular replacement. However, the accuracy 
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of de novo prediction methods has been improving rapidly. In par- 
ticular, the use of all-atom refinement to follow low-resolution 
modelling by the Rosetta de novo modelling method” led to several 
blind predictions in CASP7 for proteins of all-o, all-B and «+f 
secondary structure classes that placed most of the backbone ele- 
ments and core side chains with high accuracy (see Fig. 4a—c)”. 
This progress in de novo modelling, along with the successes above 
with refined NMR and template-based models, encouraged us to 
attempt molecular replacement with an exceptional prediction for 
the 112-residue «-helical CASP7 target T0283. 

The best of five models for T0283 blindly predicted without the use 
of templates matched the subsequently released crystal structure 
(2hh6**) with a Co r.m.s. deviation of 1.4 A over 90 residues (Fig. 4c). 
The closest previously known fold in the PDB, identified from struc- 
ture superpositions by CASP7 assessors (2b2j’’), was significantly dif- 
ferent from the T0283 crystal structure, aligning 70 residues with a Cx 
r.m.s. deviation of 3.1 A (note also the poor GDT-HA score in Table 1). 

After truncating the Rosetta prediction to a consensus core (resi- 
dues 10 to 88, for which four of the five submitted models coincided 
to within 2.5 A Co r.m.s. deviation), molecular replacement by 
Phaser showed clear features for the omitted amino- and carboxy- 
terminal helices (see Supplementary Fig. 5 and caption). Starting 
from this molecular replacement solution, the ARP/wARP software 
was able to complete the structure automatically, tracing all 112 
residues correctly. The final result (Fig. 4d) is in excellent agreement 
with the structure deposited in the PDB, which used phases experi- 
mentally derived by selenium single-wavelength anomalous disper- 
sion, with an r.m.s. deviation of 0.13A for all 112 Co atoms. In 
contrast, attempts to solve the structure by molecular replacement 
with the closest existing ‘template’ 2b2j failed to produce a clear-cut 
phasing solution (Table 1), even when knowledge of the optimal 
superposition was used to trim this search model back to the 70 
residues that aligned best to the actual structure. It will be of great 
interest to investigate whether this result can be generalized to rapidly 
phase diffraction data for proteins of new folds. 


Figure 2 | Improvement in model accuracy produced by rebuilding and 
refinement. a-d, NMR refinement tests displaying superpositions of the 
crystal structure (blue), model 1 of the NMR ensemble (red) and the lowest 
energy all-atom refined model (green) for four NMR refinement test cases 
(a, acyl CoA binding protein, 2abd; b, SH3 domain of ABL tyrosine kinase, 
lawo; ¢, guanine nucleotide binding protein, lezy; d, barstar, 1ab7). e—h, Blind 
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predictions produced by comparative modelling, displaying superpositions of 
the native structure (blue), the best template in the PDB (red) and the best of 
our five submitted models (green) for four CASP7 targets (e, T0380; f, T0385; 
g, T0330 domain 2; h, T0331). A subset of the core side chains is shown in stick 
representation to illustrate the accuracy of core packing. Figures were 
prepared in PyMOL (Delano Scientific, Palo Alto, California). 
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Improving model accuracy and molecular replacement 


The results described here show that an all-atom rebuilding-and- 
refinement protocol can produce protein structure models of high 
accuracy. The iterative protocol outlined in Fig. 1a brings together 
the individually quite powerful global optimization ideas underlying 
Monte Carlo minimization’®, tabu search'* and conformational 
space annealing’” while targeting aggressive sampling to regions most 
likely to be incorrect. The substantial improvements achieved in 
prediction quality—in several cases enabling molecular replacement 
phasing of X-ray diffraction data—suggest that structure prediction 
has matured considerably. Nevertheless, we emphasize that there 
is still considerable room for improvement: our high-resolution 
rebuilding-and-refinement protocol does not always improve start- 
ing models, and T0283 is the only CASP7 target predicted de novo for 
which the models were accurate enough for molecular replacement. 
We look forward to advances in both the energy function, notably the 
addition of configurational entropy, and in conformational sam- 
pling. The significant energy gap between the refined models and 
the refined crystal structure'’ for most of the cases studied here sug- 
gests that sampling is still the primary bottleneck for high-accuracy 
all-atom structure prediction. 

At present, the Protein Structure Initiative lists hundreds of pro- 
teins with lengths less than 200 residues that have been crystallized 
but not yet solved. Publication of diffraction data sets that have not 
yielded to experimental phasing could catalyse the development of 
new hybrid prediction/phasing algorithms, much like the blind 
CASP trials have accelerated progress in the field of structure 
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Figure 3 | Improvement in electron density using models from rebuilding 
and refinement in molecular replacement searches. Examples are 
presented for the NMR structure of acyl CoA binding protein 2abd (a, b) and 
CASP7 comparative modelling target T0385 (¢ and d). Black mesh 
represents electron density (2mF,—DF, 1.50 contour) using experimental 
structure factors and phases from molecular replacement with the starting 
model (a and ¢) or the refined model (b and d). The coordinates deposited in 
the PDB, determined using experimental phase information, are shown in 
stick representation. Note that the ‘refinement’ applied to the models refers 
to the all-atom energy-based protocol (see Fig. 2 and text) and not to 
refinement against the diffraction data. The accurate modelling of side 
chains by Rosetta was critical for the illustrated map improvement; 
molecular replacement trials gave significantly better solutions if the 
Rosetta-predicted side chains were retained rather than truncated. 
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prediction. With continuing advances in high-resolution structure 
prediction, in molecular replacement tools, and in the interface 
between these two fields, we expect that in silico phasing will become 
an increasingly important component of the crystallographer’s 
toolkit. 

In the present study, aggressive all-atom refinement was carried 
out in the absence of any experimental information. The incorpora- 
tion of experimental data into the rebuilding-and-refinement pro- 
tocol could help overcome the current shortcomings in both the 
energy function and conformational sampling and allow more con- 
sistent high-resolution structural inference. In practical applications 
to molecular replacement trials, the diffraction data do not need to 
be set aside as a stringent post facto test of model accuracy, as was 
carried out in this study. Diffraction data without phases would be 
useful in screening larger numbers of trial structures for molecular 
replacement or in complementing the physical energy terms with 
diffraction-data-derived likelihood scores* during rebuilding and 
refinement. Weak phase information, for example based on anom- 
alous scattering from intrinsic sulphur atoms”, could also be 
exploited, for instance by using an initial molecular replacement 
model to locate the anomalous scatterer sites'®. Although not used 
in the present study, NMR chemical shift, nuclear Overhauser effect, 
and residual dipolar coupling data can help to pinpoint regions of 
the models to rebuild and regions to constrain during all-atom 
refinement. On a larger scale, mass spectrometry techniques coupled 
with hydrogen/deuterium exchange’’, chemical cross-linking*' and 
radical footprinting’ show great promise for providing high- 
throughput, residue-level information that may rapidly constrain 
structure prediction and, in the absence of crystallographic data, 
help validate models. We anticipate that the combination of high- 
resolution modelling with limited experimental structural data will 


0 9B 
Figure 4 | Ab initio phasing by ab initio modelling. a—c, Superpositions of 
blind Rosetta de novo structure predictions (green) and the subsequently 
released crystal structures (blue) for CASP7 targets T0354 (a), domain 3 of 
T0316 (b) and T0283 (c). Buried side chains and backbone-aligned residues 
are displayed. d, Electron density map (2mF,—DF.; 20 contour) produced 
by automatic refinement of the molecular replacement solution obtained 
from the T0283 structure prediction (black mesh; 1o contour) agrees with 
the coordinates deposited in the PDB (red), solved with experimental phase 
information. The electron density map immediately after molecular 
replacement is shown in Supplementary Fig. 5. 
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become an increasingly powerful approach for characterizing the 
structures of biological macromolecules and complexes in the years 
to come. 


METHODS SUMMARY 


Models produced using NMR data, comparative modelling and de novo struc- 
ture prediction were refined using the targeted rebuilding-and-refinement 
protocol introduced in this paper. To assess accuracy, the resulting models 
were compared to high-resolution crystal structures by the GDT-HA (geometric 
distance test (high accuracy)) score’, the average percentage of Cu atoms 
agreeing within 0.5, 1.0, 2.0 and 4.0 A. Asa final test of accuracy and of practical 
utility, models were screened for suitability in phase estimation for crystal- 
lographic diffraction data using the Phaser molecular replacement software’®. 
The widely used ARP/wARP" and RESOLVE” programs were then used to 
refine automatically the electron density maps and build density-constrained 
protein coordinates. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


We present detailed descriptions of six methods discussed in the main text: (1) 
rebuilding-and-refinement protocol; (2) identification of regions to rebuild 
from the NMR structure ensemble; (3) preparation of blind predictions; (4) 
metrics for comparing models with crystal structures; (5) screening of models 
for suitability for molecular replacement; (6) assessing model quality with 
MolProbity. 

Rebuilding-and-refinement protocol. We describe below the three key steps of 
the rebuilding-and-refinement protocol: segment rebuilding, all-atom refine- 
ment and iterative evolution. 

For the first of these three steps, we used a new segment rebuilding protocol to 
rebuild regions with high structural variation in the model population, as these 
regions are often incorrect (see, for example, Fig. 1b). Because of uncertainties in 
the precise locations of incorrect regions, the portions of the model to be rebuilt 
were chosen stochastically from the regions with high variance at the beginning 
of each simulation. Up to 90% of all the separate regions were rebuilt in a given 
run—this allows for compensatory changes in interacting segments to occur. 

The coordinates in the region to be rebuilt were generated using the Rosetta 
fragment-insertion-based de novo folding protocol”*. After each fragment inser- 
tion, the decision to accept or reject was made according to the standard 
Metropolis criterion based on the total energy of the system. To maintain the 
connectivity of the protein chain, cyclic coordinate descent (CCD*’) was used to 
close the chain break at a stochastically selected position of the region rebuilt. 
The rebuilding process was divided into ten stages. At each successive stage, an 
increasing chain-break score (a penalty to the deviation of the peptide bond 
length at the chain break from the ideal peptide bond length) was applied. In 
each of the first five stages, the number of fragment insertion trials was ten times 
the number of residues in the region being rebuilt. In a fragment insertion trial, 
randomly chosen nine-residue, three-residue, or one-residue fragments were 
inserted into randomly chosen positions in the region being rebuilt, and the 
Metropolis Monte Carlo criterion was used to accept or reject the newly inserted 
fragment based on the Rosetta low-resolution energy function”. In each of the 
five last stages, in addition to the fragment insertion trials, we also performed 
cyclic-coordinate-descent-based backbone torsion angle moves (CCD moves) in 
which the cyclic coordinate descent solution was calculated and the backbone 
torsion angles for five randomly picked positions in the region being rebuilt were 
modified according to the CCD solution. 

If after the ten rebuilding stages described above any chain break remained 
larger than 0.2 A, the region to be rebuilt was expanded by one residue on both 
sides. The above fragment insertion and chain-break closing process was 
repeated using a harmonic tether to the starting values of the torsion angles in 
the newly included regions (which may fall into regions with low variance in the 
starting population) and another stochastically selected chain-break position. 
The regions to be rebuilt were allowed to expand by up to five residues upstream 
and downstream of the original starting and ending positions, until chain closure 
was achieved. This procedure was usually sufficient to ensure the recovery of a 
continuous peptide chain. In very rare cases where the chain could not be closed 
ina rebuilt region, it was merged with an adjacent region to be rebuilt along with 
the fixed portion of the model between these two regions and the rebuilding 
process was repeated. With the added flexibility of a larger region being rebuilt, 
the peptide chain could essentially always be closed. Variable regions at the chain 
termini were rebuilt using the fragment insertion-based de novo protocol with- 
out steps for chain-break closure. 

The segment rebuilding protocol is implemented in the ‘loop_relax’ subrout- 
ine in the freely available Rosetta source code. 

The segment rebuilding protocol described above aggressively employs frag- 
ment insertion moves to sample a broad range of conformations. The all-atom 
refinement protocol—the second key step of the rebuilding-and-refinement 
protocol—then searches for local minima in the vicinity of the structures pro- 
duced by segment rebuilding using a detailed all-atom force-field. 

The Rosetta all-atom energy function is largely dominated by short-range 
interactions’, primarily Lennard-Jones interactions, orientation-dependent 
hydrogen bonding, and the Laziridis—Karplus implicit solvation model**. The 
torsional states of backbone and side chains are evaluated using knowledge- 
based potentials derived from amino-acid-specific Ramachandran maps and 
the rotamer probabilities and y angle standard deviations in the backbone- 
dependent rotamer library developed by ref. 39. 

During all-atom refinement, all the backbone and side-chain atoms in the 
protein are explicitly represented. The bond lengths and angles are kept fixed at 
ideal values*®, and the polypeptide chain is described in internal coordinates 
(the backbone and side-chain torsion angles). A single move in the all-atom 
refinement protocol consists of the following steps: (1) one of the several types 
of perturbations to the backbone torsion angles described below; (2) greedy 
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optimization of the side-chain rotamer conformations (‘rotamer trials’) for 
the new backbone conformation; (3) minimization of the energy with respect to 
either the backbone degrees of freedom only (first half of refinement procedure) 
or backbone and side-chain degrees of freedom (second half of refinement 
procedure) using the Davidson—Fletcher—Powell (DFP) algorithm. The conver- 
gence criterion for exiting this quasi- Newton minimization was decreased from 
10° to 10° during the course of refinement to enable more complete min- 
imization in the final stages of refinement. (4) The compound move (steps 1-3) 
is accepted or rejected according to the Metropolis Monte Carlo criterion. These 
compound moves extend the Monte Carlo minimization procedure found to be 
quite powerful in previous studies” by incorporating discrete optimization of 
side-chain conformations; this allows energy-directed barrier hopping at the 
level of the side chains. 

The following backbone perturbations are used at step (1) in the Monte Carlo 
minimization move described above and in a previous reference’. The ‘small’ 
and ‘shear’ moves are small perturbations of the backbone at five to ten randomly 
chosen positions. In small moves, and \y are perturbed randomly by up to 1° in 
helix or strand regions or 1.5° in loop regions. In shear moves, ¢ is perturbed 
randomly by up to 2° in helix or strand regions or 3° in loop regions and the 
preceding Wy is perturbed by the same amount of degrees in the opposite direction 
to produce a compensatory shear motion in the peptide plane. The ‘wobble’ and 
‘crank’ moves involve insertion of fragments and are more aggressively perturb- 
ing than the small and shear moves". For both of these move types, the fragment 
set*® is filtered to exclude those which cause a mean square deviation in the 
coordinates of the downstream atoms of more than 60 A and one of the remain- 
ing fragments is chosen randomly for insertion. In wobble moves, the torsion 
angles belonging to the three residues immediately following the site of the one- 
or three-residue fragment insertion are varied to minimize the downstream 
perturbation still further. In crank moves, one residue is varied immediately 
after the insertion site, and three more residues at a site spaced by 6-20 residues 
from the fragment insertion site; this produces a ‘crankshaft’ -like movement of 
the intervening portion of the chain. ‘Small-wobble’ moves involve an initial 10— 
20° random change in the torsion angles of a single residue, followed by mini- 
mization of the perturbation over the three adjacent residues. The minimization 
of the perturbation in the wobble and crank moves is carried out using the fast 
gradient-based algorithm described previously". After all five move types, the 
side chains are optimized and the energy is minimized as described in the pre- 
ceding paragraph. 

The all-atom refinement protocol is divided into three stages. The first is 
ramp-up. The ramp-up stage consists of sets of ten small and shear moves 
preceded by combinatorial optimization of the side-chain rotamer conforma- 
tions. The weight on the repulsive part of the Lennard-Jones potential is pro- 
gressively increased from 0.05 to 1.0 over eight such move sets. The gradual 
ramping up of the repulsive weight facilitates a smooth rearrangement of the 
side chains with small perturbations of the backbone and ensures a reasonably 
well-packed low-energy model before the more aggressive second stage. This 
second stage is the aggressive sampling stage: alternating wobble, small-wobble 
and crank compound Monte Carlo minimization moves are carried out; the total 
number of attempts for each move type is equal to the number of residues in the 
protein. A full combinatorial search over side-chain rotamer conformations is 
carried out after every 25 attempts of each type of move. The more aggressive 
nature of the moves used at this stage allows the traversal of modest energy 
barriers. The convergence tolerance for the DFP minimization is set to 10° *. 
The third stage is the fine optimization stage: alternating small and shear moves 
are carried out, again for a total number of attempts equal to the number of 
residues in the protein. The more subtle backbone conformation changes 
brought about by these moves assist convergence on a relatively low-energy local 
minimum. The convergence tolerance for minimization is set to 10 >. After 
these three stages, a final minimization with respect to all degrees of freedom 
is carried out with a convergence tolerance of 10 °. 

The refinement protocol described above is implemented in the ‘fullatom_ 
relax’ subroutine in Rosetta; the CPU cost is about 20 min for a 100-residue 
protein on an Intel Pentium IV 1.6 GHz processor. 

The challenge in refinement is to focus sampling on the lowest energy regions 
of the energy landscape identified up to that point while maintaining a broad 
enough search to avoid converging on a local energy minimum. Towards this 
end, we developed a protocol that balances intensification of the search in low- 
energy regions with diversification to maintain subpopulations exploring 
alternative energy minima. The approach—the third key step of the rebuild- 
ing-and-refinement protocol; that is, ensemble evolution by alternate cycles of 
diversification and intensification—adopts the idea of explicit control of the 
search intensity from tabu search", and is a generalization of the conformational 
space annealing (CSA) technique, which has achieved success in a broad range of 
optimization problems”. 
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In both the intensification and diversification steps, an input population of 
200 models was clustered using the method described in ref. 13 to identify 
distinct populations of structures. The clustering threshold was chosen such that 
the largest cluster contained 10% of the models. For each cluster, ten models 
were selected (if there were fewer than 10 models in a cluster, all were selected) 
and each model was subjected to nine independent segment rebuilding plus all- 
atom refinement runs initialized with different random number seeds. 

In the diversification stages (iterations 1, 3, 5, 7 and 9), the models in the 
parent population were kept in their original cluster assignment. A newly gen- 
erated model was assigned to the closest cluster if the root-mean-squared devi- 
ation over alpha carbons (C,, r.m.s. deviation) between this model and the closest 
cluster member was less than the current diversity threshold (see below), and the 
highest energy member of the cluster was thrown away. If the r.m.s. deviation 
between a newly generated model and its closest cluster member was higher than 
the current diversity threshold, then the model with the highest energy in the 
current parent population was thrown away, and the newly generated model 
formed a cluster of its own. This is analogous to speciation in natural evolution. 
As a model is discarded for each new model added, the population size stayed 
unchanged. The diversification step favours a broad exploration of the confor- 
mational space by maintaining the distinct populations of clusters: there is 
competition for low energy within but not between clusters. Combined with 
the initial clustering step, it ensures that the new population will not be domi- 
nated by overly closely related structures, which could result in premature con- 
vergence away from the global minimum. 

In the intensification stages (iterations 2, 4, 6, 8 and 10), all but the lowest 
energy 10% of the entire population (parents plus offspring) is discarded to bring 
the population back to a size of 200. The remaining models from the parent 
population keep their original cluster assignment. A newly generated model was 
assigned to the closest remaining cluster if the r.m.s. deviation between this 
model and the closest cluster member was lower than the current diversity 
threshold; otherwise it formed a new cluster of its own. This stage differs from 
the diversification stage in that the energy-based selection is carried out across all 
clusters and hence higher energy clusters can be eliminated completely. This 
stage allows more thorough exploration of the most promising (lowest energy) 
regions of the energy landscape explored thus far. 

The diversity threshold used to maintain distinct populations and to guide the 
spawning of new populations was reduced at each iteration to allow gradual 
convergence on the global energy minimum. The starting value was the cluster- 
ing threshold in the original population, and this was reduced by 0.1 A at each 
iteration. This annealing of the diversity threshold was introduced in the CSA 
strategy’”. 

The new parent population generated by the diversification or intensification 

procedures was used to seed the next generation, and nine independent segment 
rebuilding plus all-atom refinement calculations were again carried out for each 
parent. After ten iterations, the low energy models were clustered and the lowest 
energy models in the largest five clusters were selected as the final predictions. 
The overall iterative procedure took approximately 2,000 CPU hours per target. 
For molecular replacement efforts, this computational effort would probably be 
significantly reduced if phasing trials with diffraction data are used to screen 
models. 
Identification of regions to rebuild from the NMR structure ensemble. The 
test cases for NMR refinement were chosen to be proteins representing different 
fold topologies for which an NMR structure and a high-resolution crystal struc- 
ture (with structure factors deposited in PDB) existed. These were chosen from 
the data sets used by refs 21 and 43. 

For investigations of refinement of NMR structures, we rebuilt two sets of 
regions. The first are regions that vary within the NMR ensemble. As in the 
comparative modelling case, we have observed that regions that vary within 
the NMR ensemble are likely to be the regions that are most different from a 
high-resolution crystal structure. These are most likely loops that are either 
inherently dynamic in the NMR structure or loops that are held in place with 
insufficient restraints. (Applying all-atom refinement to the NMR ensembles 
gave essentially the same list of variable regions (data not shown).) 

The second set of regions are segments that are internally consistent within 
the NMR ensemble but systematically under-packed. To estimate packing we 
used a recently developed packing metric (W. Sheffler, personal communica- 
tion) based on the relative accessible surface areas of groups of atoms. For each 
buried atom, we compute the largest sphere tangent to that atom which can fit 
into empty space within the protein. A group composed of all atoms within 5 A 
of the centre is defined for each sphere. For each group of atoms, accessible 
surface (SASA) to small and large spherical probes (radii 0.9 Aand 2A, respec- 
tively) is computed; given that a ball of atoms has a certain area accessible to a 
large sphere, less-accessible area to a small sphere indicates better packing. A 
summary percentile score is computed on the basis of a reference set of crystal 
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structures, approximating the fraction of native proteins which are better packed 
than the scored structure. 

Preparation of blind predictions. The initial set of template-based models was 
obtained from the 3D-Jury server“ and subjected to all-atom refinement using 
the Rosetta all-atom energy function. Up to ten templates from which the very 
lowest energy models were derived were used as the candidate templates. 
Alignment ensembles between the candidate templates and the target sequence 
were parametrically generated using the K*Sync alignment method”. The align- 
ment ensemble was turned into a model ensemble by placing the sequence of 
the query onto the backbone of the parent based on each alignment. Missing 
densities from the insertion and deletion regions of the alignment were 
modelled using the segment modelling protocol described in the ‘rebuilding- 
and-refinement protocol’ section. The full-chain models were then subjected to 
the all-atom refinement procedure as described in the same section, constrained 
by a set of Ca—Cu distance constraints, described next. 

The Ca—Ca distance constraints were generated from the 3D-Jury** template- 
based models with the lowest Rosetta all-atom energies after all-atom refine- 
ment. A Co—Cx pair was used to derive constraints only when the associated 
distance was less than 8 A in more than 80% of the selected constraint-generating 
models. Upper and lower bounds for each of these pairs were determined by 
padding the highest and lowest of these distances by one standard deviation of 
the Ca—Czx distance distribution function, as described in ref. 46. For computa- 
tional efficiency, we further trimmed down the number of constraint pairs by 
eliminating neighbouring pairs separated by one or two residues. During all- 
atom refinement, a penalty is applied when the Ca—Cz distances in the model 
exceed the upper or lower limit of the corresponding constraints. If a distance 
exceeds the upper or lower constraint limit by d (in A), then the penalty E. is @ 
when d<0.5 A, and (d—0.25 A) when d=0.5A. The resulting ensemble of low- 
energy comparative models became the inputs to further rounds of rebuilding- 
and-refinement (Fig. 1a). 

For targets without clear templates identified by the 3D-Jury server“, the full 
chain was fully modelled by fragment assembly starting from an extended chain, 
followed by the all-atom refinement procedure described above. The conver- 
gence of the Rosetta de novo prediction protocol can differ significantly for 
different sequence representatives of a given fold’**’. For T0283, one of seven 
tested sequence homologues gave exceptionally well converged low-energy mod- 
els that, after sequence mapping, allowed structure prediction for the target 
sequence with the rebuilding-and-refinement protocol’*”. 

About 100,000 all-atom refined models were generated for each modelling 

target, requiring approximately 100,000 CPU hours. As noted above, for 
molecular replacement efforts, this computational effort would probably be 
significantly reduced if phasing trials with diffraction data are used to screen 
models; as the predicted models used in this manuscript were prepared as blind 
predictions for CASP7, such diffraction data were not available at the time of 
modelling. 
Metrics for comparing models with crystal structures. As has been discussed 
previously, no metric for comparing structure models with the crystal structures 
is perfect**. In this work, we used three different structural metrics for model 
quality assessment. The Cx r.m.s. deviation is a widely used metric for structure 
comparison, but it can be distorted by large deviations in a small number of 
residues, especially at the termini or in long surface loops. The GDT-HA (geo- 
metric distance test (high accuracy) ) score is the average percentage of Cus in the 
model within 0.5, 1.0, 2.0, and 4.0 A of the corresponding Cx coordinates in the 
crystal structure; we used TMalign® to align the structures. This metric is less 
sensitive than the full-chain r.m.s. deviation to deviations in poorly ordered 
termini and long loops, and was used in the CASP7 template-based modelling 
assessment. 

The core residue all-atom r.m.s. deviation describes the accuracy of both the 
backbone and side-chain conformation prediction. We used this metric in the 
evaluation of NMR refinement because it can be applied to both the starting 
(NMR ensemble) and ending (Rosetta refined) models. In template-based 
modelling, this metric is not practical as the template usually does not have 
the same amino acid sequence as the target to be modelled. 

In addition, successful molecular replacement using the predicted structure 
can be regarded as a stringent test for model quality assessment, as suggested in 
ref. 49. 

Screening of models for suitability for molecular replacement. Searching for 
molecular replacement solutions involves applying rigid-body transformations 
along the six rotational and translational degrees of freedom. We carried out this 
search with the Phaser software, which is described in ref. 10 and references 
therein. For completeness, the algorithms are briefly summarized here. Phaser 
uses likelihood functions to judge how well molecular replacement models agree 
with the measured diffraction data after they have been first rotated and then also 
translated. Brute-force likelihood calculations over grids of orientations and 
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positions are computationally expensive, so fast-fourier-transform-based 
approximations are used to compute sets of possible solutions, which are 
rescored with the full likelihood targets. By using a tree-search-with-pruning 
strategy, almost all solutions that would be found with a full six-dimensional 
search are found, but with a much lower computational cost. As well, this 
strategy allows effective searches for multiple copies, in crystals with more than 
one molecule in the asymmetric unit. For each molecule to be placed, a rotation 
search is first carried out. A translation search is then carried out for each 
plausible orientation. All plausible rotation/translation solutions are checked 
for packing in the lattice, and solutions that pack successfully are subjected to 
rigid body refinement. If more than one copy is present, all plausible partial 
solutions are fixed in turn while carrying out rotation and translation searches 
for subsequent copies. In molecular replacement trials with Phaser, the clearest 
indication of success comes from high values of the Z-score (number of standard 
deviations above the mean), computed by comparing the log-likelihood-gain 
(LLG) for the peak with LLG scores for a random sample of search points. 

For molecular replacement in each of the NMR modelling cases, we evaluated 
the combined NMR ensemble as a potential search model and compared these 
results to trials with the 25 lowest energy Rosetta models from rebuilding and 
refinement (Table 1). Furthermore, we have carried out molecular replacement 
trials with each of the members of the deposited NMR ensemble individually, 
with results given in Supplementary Table 1. Finally, for an actual search for a 
good molecular replacement solution, a larger set of models from rebuilding and 
refinement can be screened rapidly. We thus extended the search to the 1,000 
lowest energy models from Rosetta rebuilding and refinement and the results, 
notably improved, are presented in Supplementary Table 1. 

For molecular replacement in comparative modelling cases, we prepared 
search models from the best existing templates and from our comparative 
modelling predictions. For the best templates, we followed the ‘mixed model’ 
protocol described in ref. 4 for optimizing molecular replacement. Furthermore, 
on the basis of the 3DPAIR®™ structure alignment between the native structure 
and the best template structure, the template structure was trimmed to contain 
only the structurally alignable regions. Then the native sequence was threaded 
onto the backbone of the corresponding template structure, while retaining the 
side-chain coordinates of the identical residues between the template and native 
sequences. Non-identical side chains longer than serine were mutated to serine, 
followed by Rosetta side-chain packing protocol”! to model the mutated serine 
and the shorter non-identical side chains, while keeping the identical side-chain 
conformation fixed. To prepare search models for these predictions, we super- 
imposed 100 low-energy models from the final round of refinement, and defined 
the model that has the lowest average r.m.s. deviation to the rest of the models as 
the reference model. Then we calculated the average per-atom distance D, 
between each of the superimposed models and the reference model. The 
Rosetta temperature factor is calculated as T, = 817D,°/3 for each atom and 
inserted to the B-factor column of the refined model files. The Rosetta temper- 
ature factor is intended to represent the uncertainty in the final refined models 
after extensive refinement in the Rosetta all-atom force field. As suggested 
earlier'’, by using the B-factor effectively to smear each atom over its possible 
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positions, the correlation of the modelled electron density with the true electron 
density can be maximized. 

For the de novo modelling case, target T0283, search models for molecular 

replacement were trimmed according to residues for which there was consensus 
among submitted models. Supplementary Fig. 5 gives a more detailed descrip- 
tion and illustration of the molecular replacement solution. 
Assessing model quality with MolProbity. For the investigations of refinement 
of NMR models, we used the MolProbity software” to investigate the quality 
of the refined models versus that of the starting NMR ensemble. For purposes of 
comparison, we chose the lowest energy refined model and the first member of 
the deposited NMR structure. Supplementary Table 2 shows the clash score, 
number of rotamer outliers and number of Ramachandran outliers of the 
NMR and refined models. The refined models consistently have better model 
quality than the starting NMR structure based on these metrics. 
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Modulation of Saturn's radio clock by solar 


wind speed 


Philippe Zarka', Laurent Lamy’, Baptiste Cecconi', Renée Prangé’ & Helmut O. Rucker? 


The internal rotation rates of the giant planets can be estimated by 
cloud motions, but such an approach is not very precise because 
absolute wind speeds are not known a priori and depend on lati- 
tude’: periodicities in the radio emissions, thought to be tied to the 
internal planetary magnetic field, are used instead’>. Saturn, des- 
pite an apparently axisymmetric magnetic field®, emits kilometre- 
wavelength (radio) photons from auroral sources. This emission is 
modulated at a period initially identified as 10h39 min 24+ 7s, 
and this has been adopted as Saturn’s rotation period’. Sub- 
sequent observations”*, however, revealed that this period varies 
by +6 min ona timescale of several months to years. Here we report 
that the kilometric radiation period varies systematically by 1% 
with a characteristic timescale of 20-30 days. Here we show that 
these fluctuations are correlated with solar wind speed at Saturn, 
meaning that Saturn’s radio clock is controlled, at least in part, by 
conditions external to the planet’s magnetosphere. No correlation 
is found with the solar wind density, dynamic pressure or magnetic 
field; the solar wind speed therefore has a special function. We also 
show that the long-term fluctuations are simply an average of the 
short-term ones, and therefore the long-term variations are prob- 
ably also driven by changes in the solar wind. 

Low-frequency magnetospheric radio emissions have been used 
until now to measure the rotation of giant planets because they are 
produced by keV electrons moving along planetary magnetic field 
lines that are presumed to rotate with the planet’s interior’. These 
emissions are anisotropic; that is, they are preferentially directed in 
a hollow conical beam aligned with the direction of the local magnetic 
field'®. Combined with the rotation of the usually non-axisymmetric 
planetary magnetic field"', these properties lead to a rotational modu- 
lation of the observed intensity of the emission. At Saturn, the intense 
auroral kilometric radiation (SKR) was found in the Voyager era to be 
strongly modulated at a period Psxr = 10h 39 min 24 + 7 s, which is 
close to that observed for atmospheric cloud features’. However, 
Saturn’s magnetic field is very nearly axisymmetric, and the auroral 
sources are not co-rotating with the planet; rather, they are fixed in 
local time’*”’. This makes it difficult to understand the strong SKR 
modulation without appeal to the existence of a magnetic anomaly 
that escaped detection by magnetometers on the Pioneer and Voyager 
spacecraft'*"'*. The uncertainty of +7 s on Psxp = 10 h 39 min 24 s 
was thought to be limited only by the available time span (nine 
months), under the implicit assumption of a constant rotation 
period. However, 24 years later, the SKR period measured by the 
radio experiment on board Cassini® is Pscg = 10h 45 min 45 + 36s, 
The difference of more than 6 min cannot be due to a change in 
Saturn’s rotation rate, owing to the large inertia of the planet. 
Ulysses’ and Cassini*’” radio measurements actually showed that 
Psxx continuously varies over the long term (several months to 
years), with ~1% relative amplitude. 


Two models were proposed to explain these variations. The first'® 
invoked an external cause, with nonrandom fluctuations in the solar 
wind speed at Saturn causing SKR source displacement in local time, 
leading to an apparent radio period that is different from the planet’s 
true rotation period. The other’? invoked an internal cause, namely 
mass injection from Enceladus in the magnetosphere’s plasma disk 
and a variable electrodynamic coupling between this disk and 
Saturn’s ionosphere. 

Standard techniques for harmonic signal analysis, such as Fourier 
transform, require a 100-period window to provide 1% accuracy, and 
thus permit only long-term variations to be addressed’’. Taking 
advantage of Cassini’s quasi-continuous radio observations, we 
developed a method to address faster fluctuations. By integrating 
the received flux over the range 100-400 kHz, where most of the 
SKR power is emitted, we obtained a time series of SKR power in 
which one broad peak was observed for each rotation of Saturn (see 
Supplementary Figs 1 and 2). This time series is displayed in Fig. 1 ina 
format that reveals variations in the phase of SKR peaks relative to a 
fixed reference period. In addition to the previously noted long-term 
variation’’, we see quasi-periodic oscillations of the SKR phase—and 
thus of the SKR period—on a timescale of 20-30 days. Smoothing the 
SKR time series and cross-correlating consecutive peaks (as described 
in Supplementary Figs 1-3) allowed us to estimate Psxp with an 
accuracy of +2 min (0.3%) at timescales down to ~1 week. Results 
are displayed in Fig. 2a over the 1,186-day interval studied (2003 June 
30 to 2006 September 27). Ubiquitous fluctuations of ~2% peak-to- 
peak amplitude are detected on a timescale of 20-30 days, super- 
imposed on the long-term trend measured by previous authors’’. 

What is the origin of these variations? A timescale of 20-30 days is 
characteristic of variations in the solar wind at Saturn, already known 
to control the SKR intensity and power””’, as can be seen in Fig. 1. 
However, the duration of Cassini’s orbits around Saturn has also 
varied between 18 and 30 days since mid-2004. Orbital parameters 
such as the planetocentric distance and latitude of the spacecraft 
affect observations of SKR: decreasing distance increases SKR signal 
strength, and hence detectability, and changes in latitude influence 
SKR visibility as a result of the change in the geometry of observation, 
SKR being emitted from high-latitude sources in conical patterns 
centred on the local magnetic field®’®!’. Thus, Fig. 2a includes the 
fluctuations in the solar wind speed ballistically projected to Saturn; 
variations of spacecraft range and latitude are displayed in Fig. 2b. In 
Fig. 2c we compare the Fourier power spectrum of the fluctuations of 
Psxp over the entire interval studied with those of the solar wind 
speed, spacecraft distance and latitude. Peaks appear in Psxp fluctua- 
tions at ~21.5, 23.0 and 25.5 days. The last of these coincides with the 
same peak in solar wind speed fluctuations. Spacecraft latitude peaks 
with an 18—19-day period. We simulated possible beatings between 
solar-wind-induced and orbit-induced variations by the product 
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(versus time) of the solar wind speed multiplied by the spacecraft 
latitude. The spectrum of this quantity shows main peaks at ~20.0, 
21.5, 24.0 and 26.5 days, all in the same range as—and some of 
them matching—main peaks in Psxp. Study of subintervals from 
our data set confirms these results: within the interval day of the year 
(DOY) 2004 = 415-670 (DOY = 1.0 corresponds to 2004 January 1 
at Ohur), where both distance and latitude are regularly modulated, 
strong Psxr peaks precisely match peaks in solar wind speed, space- 
craft latitude variations, and their product; in the interval between 
DOY 280 and 620, after Cassini’s Saturn orbit insertion, the main 
peak of both Psxr and the solar wind speed is at about 22 days. Both 
solar wind speed and orbit-dependent viewing geometry therefore 
seem to contribute to observed variations in the SKR period. 

For better quantification of the influence of the solar wind, we 
analysed separately the interval before Cassini’s Saturn orbit inser- 
tion (left of the dotted vertical line in Fig. 2a), over which the SKR 
viewing geometry remained fixed with no expected influence of 
orbital parameters on SKR visibility. We found a linear correlation” 
coefficient C > 40% between Psxp and the solar wind speed (Fig. 3a). 
For comparison, the well-known correlation between SKR power and 
solar wind speed gives here a similar coefficient C = 44%. The prob- 
ability of obtaining C> 40% with two random data sets of the same 
length as in Fig. 3a is ~10- ’; the correlation found is therefore highly 
significant. A similar study of other subintervals leads to correlation 
coefficients of up to +70% between Psxp and the solar wind speed 
(for example, within the interval between DOY 640 and 760, where 
the spacecraft’s latitude remained fixed and near zero). The lack of a 
perfect phase relationship (C~ 100%) comes from the inevitable 
inaccuracies in the ballistic projection of the solar wind to Saturn 
(never exceeding +4 days”***), time-variable solar activity (for example 
coronal mass ejections) causing azimuthal variations in the structure 
of the solar wind*’, and details of the interaction between the solar 
wind and Saturn’s magnetosphere (the solar wind might function as a 


Phase in rotation (deg) 


DOY 2004 


Figure 1 | Evidence of short-term variations in Psxp and their relation to 
long-term variations. The SKR power-time series (derived as explained in 
Supplementary Figs 1 and 2) is displayed here over the 3.25-year interval 
studied, in a format similar to that of Fig. 1 in ref. 17. Variations in SKR 
power during consecutive rotations are plotted as consecutive vertical lines 
with a power scale in grey levels, using an assumed fixed period of 

10h 48 min. Time is in day of the year (DOY) 2004. Data gaps are displayed 
in flat grey (major ones are before DOY — 137 and at about DOY 191 + 4 and 
377 + 5). Each rotation is plotted twice for clarity, separated by the white 
horizontal line. The origin of phases—and thus the absolute phase—is 
arbitrary. Previous authors’” noted that the fact that the SKR peak wanders 
with variable slope over the time interval means that a fixed period does not 
organize the SKR modulation well. We see here the same long-term 
behaviour. In addition, the quality of our data processing (see the legend to 
Supplementary Fig. 1a) reveals quasi-periodic oscillations of the SKR period 
ona timescale of 20-30 days. Those are especially clear after DOY 400, where 
the long-term drift is small and fewer data gaps are present. The amplitude 
of these fluctuations is large (~2% peak to peak), and their long-term 
averaging results in the slow, smaller-amplitude (<1% peak to peak over 
the studied interval) variation of Psxp noted by previous authors. 
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trigger, efficient only when energy has previously been stored in the 
magnetosphere, so that SKR peaks may sometimes be “missing’). 

The correlation found above between short-term (<1 month) 
fluctuations in Saturn’s radio period and variations in the solar wind 
speed indicates an external origin for Psxp variations. As we did not 
find any significant correlation between Psxp and other solar wind 
parameters such as density, dynamic pressure (Fig. 3b) or magnetic 
field, its speed must have a special function. This result validates the 
assumptions of the model" proposed to explain these variations in 
terms of SKR source displacement in local time caused by fluctua- 
tions in the solar wind speed. An additional internal cause’? is not 
excluded, which could be another reason for not finding a one-to- 
one correlation between Psxp and the solar wind speed. 

When averaging Psxpr fluctations over >1 month, one obtains the 
long-term fluctuations already noted by previous authors'’, which 
are therefore merely an average of the short-term ones (Fig. 1). The 
long-term variations could therefore also be driven by changes in the 
solar wind. In the long term, there does indeed seem to be a relation- 
ship between the solar wind speed and Psxr: the slow overall decrease 
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Figure 2 | ‘Short-term’ variations in P;xp compared with solar wind speed 
at Saturn and with variations in orbital parameters of Cassini. The same 
3.25-year interval as in Fig. 1 is shown. a, Variations in Pgxpz (heavy line; left 
scale) obtained as explained in the text and in Supplementary Figs 1-3. The 
solar wind speed plotted below (light line; right scale), measured by the ACE 
and WIND spacecraft near the Earth’s orbit (http://omniweb.gsfc.nasa.gov/) 
and projected to Saturn, shows similar fluctuations on a timescale of 20-30 
days. Solar wind projection includes ballistic radial projection from ~1 to 
10 AU, plus a delay compensating for the longitude difference between Earth 
and Saturn. The dotted vertical line indicates Cassini’s Saturn orbit 
insertion. b, Cassini orbital parameters: distance to Saturn (dashed; left 
scale; 1 Rs = 1 Saturn radius = 60,300 km) and latitude (dotted; right scale). 
c, Fourier power spectrum of the fluctuations of all the above quantities, plus 
the quantity (latitude < solar wind speed) which provides a simple way of 
simulating beating between latitudes (and thus visibility of the radio 
emission) and variations in the solar wind (SW) speed. 
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Figure 3 | Comparison of SKR period variations with solar wind speed and 
dynamic pressure at Saturn. In the time interval before Saturn orbit 
insertion shown here, the spacecraft latitude remains quasi-constant and its 
distance to Saturn steadily decreases; no influence of orbital parameters on 
SKR visibility is therefore expected. This interval is therefore best suited to 
searching for a correlation between Pgxr variations (heavy lines) and the 
solar wind parameters (light lines), not polluted by other variabilities. For a 
better comparison of their fluctuations on a timescale shorter than ~1 
month, the two displayed quantities have been detrended (by subtraction of 
a running average over ~2 months) and normalized by their respective 
standard deviations. a, Correlation between Psp and solar wind speed. 
Except for two ~ 10-day intervals, near DOY —25 and +65, the correlation is 
high, with a linear correlation coefficient C= +0.4 (see the text). In the two 
main ‘anomalous’ intervals mentioned above, a few data gaps exist in Cassini 
SKR data, and the solar wind speed may have been contaminated by the 
effect of coronal mass ejections, whose ballistic projection leads to 
overestimated values whenever the point at which the solar wind is measured 
in situ (here, by ACE or WIND spacecraft) and the target of the projection 
(Saturn) are not radially aligned’*”’. b, Correlation between Psxz and solar 
wind dynamic (ram) pressure. Correlation is low, with C ~ —0.1. 


in the solar wind speed from ~550kms~' to ~400kms_' in Fig. 2a 
seems to be anticorrelated with the trend of Psxr increasing from 
<646 min to >649 min. 

Long-term variations similar to those of Psp have been found to 
affect Saturn’s azimuthal magnetic field component**” and possibly 
also the electron density in the inner magnetosphere”’ and the posi- 
tion of the magnetopause”’. Short-term fluctuations are very difficult 
to address for these quantities because they are measured in situ by 
Cassini during a small fraction of each orbit. Their dependence on 
fluctuations in the solar wind remains to be investigated. If variations 
in Psxp are indeed caused by SKR source displacement in local time’’, 
then the use of Cassini’s instantaneous radio imaging capability’? for 
monitoring motions of the SKR source should permit their decon- 
volution from Psxg measurements, thus permitting a more accurate 
determination of Saturn’s true (internal) rotation rate. 
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Cavity QED with a Bose-Einstein condensate 


Ferdinand Brennecke’, Tobias Donner’, Stephan Ritter’, Thomas Bourdel’, Michael Kohl* & Tilman Esslinger’ 


Cavity quantum electrodynamics (cavity QED) describes the 
coherent interaction between matter and an electromagnetic field 
confined within a resonator structure, and is providing a useful 
platform for developing concepts in quantum information pro- 
cessing’. By using high-quality resonators, a strong coupling 
regime can be reached experimentally in which atoms coherently 
exchange a photon with a single light-field mode many times 
before dissipation sets in. This has led to fundamental studies with 
both microwave”’ and optical resonators*. To meet the challenges 
posed by quantum state engineering’ and quantum information 
processing, recent experiments have focused on laser cooling and 
trapping of atoms inside an optical cavity**. However, the tre- 
mendous degree of control over atomic gases achieved with 
Bose-Einstein condensation’ has so far not been used for cavity 
QED. Here we achieve the strong coupling of a Bose-Einstein 
condensate to the quantized field of an ultrahigh-finesse optical 
cavity and present a measurement of its eigenenergy spectrum. 
This is a conceptually new regime of cavity QED, in which all 
atoms occupy a single mode of a matter-wave field and couple 
identically to the light field, sharing a single excitation. This opens 
possibilities ranging from quantum communication” to a 
wealth of new phenomena that can be expected in the many-body 
physics of quantum gases with cavity-mediated interactions’*™. 

The coherent coupling of a single two-level atom with one mode of 
the quantized light field leads to a splitting of the energy eigenstates of 
the combined system and is described by the Jaynes-Cummings 
model’’. For the experimental realization the strong coupling regime 
has to be reached, where the maximum coupling strength gy between 
atom and light field is larger than both the amplitude decay rate of the 
excited state y and that of the intracavity field x. In the case ofa thermal 
ensemble of atoms coupled to a cavity mode, the individual, position- 
dependent coupling for each atom has to be taken into account. 

To capture the physics of a Bose-Einstein condensate (BEC) 
coupled to the quantized field of a cavity, we consider N atoms occu- 
pying a single wavefunction. Because the atoms are in the same 
motional quantum state, the coupling gto the cavity mode is identical 
for all atoms. Moreover, bosonic stimulation into the macroscopically 
populated ground state should largely reduce the scattering of atoms 
into higher momentum states during the coherent evolution. This 
situation is therefore well described by the Tavis-Cummings model”, 
where N two-level atoms are assumed to identically couple to a single 
field mode. A single cavity photon resonantly interacting with the 
atoms then leads to a collective coupling of g/N. 

A key characteristic of the coupled BEC-—cavity system is its eigen- 
energy spectrum, which we map out with a single excitation present. 
An ensemble of thermal atoms does not fulfill the requirement of 
identical coupling, but it shows a similar energy spectrum, which can 
be modelled by the Tavis-Cummings hamiltonian with an effective 
collective coupling’’. In previous measurements'*” and also ina very 
recent report”, these eigenenergies have been measured for thermal 
atoms coupled to a cavity. Aside from the sensitivity of the spectrum 


to the precise spatial distribution of the atoms, the differences 
between a BEC and a thermal cloud, or between a BEC and a Mott 
insulator, should also be accessible through the fluctuations of the 
coupling, that is, in the width of the resonances”. 

First experiments bringing together BEC physics and cavities con- 
centrated on correlation measurements using single atom counting”, 
studied cavity enhanced superradiance of a BEC in a ring cavity”, 
observed nonlinear and heating effects for ultracold atoms in an 
ultrahigh-finesse cavity’*?> and achieved very high control over the 
condensate position within an ultrahigh-finesse cavity using atom 
chip technology”. 

To create a BEC inside an ultrahigh-finesse optical cavity we have 
modified our previous set-up”. The experiment uses a magnetic trap 
36mm above the cavity, where we prepare 3.5 X 10° ®’Rb atoms in 
the |F, mp) = |1, —1) state with a small condensate fraction present. 
The atoms are then loaded into the dipole potential of a vertically 
oriented standing wave, formed by two counter-propagating laser 
beams. By varying the frequency difference 6 between the upwards 
and the downwards propagating wave, the standing-wave pattern, and 
with it the confined atoms, move downwards at a velocity v = 16/2, 
where / is the wavelength of the trapping laser**’. Because of continu- 
ous evaporative cooling during the transport, the number of atoms 
arriving in the cavity is reduced to typically 8.4 < 10° atoms with a 
small condensate fraction present. During the 100 ms of transport a 
small magnetic field is applied to provide a quantization axis and the 
sample remains highly spin-polarized in the |1, —1) state. However, 
owing to off-resonant scattering in the transport beams, a small frac- 
tion of the atoms undergoes transitions into the |F = 2) hyperfine state 
manifold. 

At the position of the cavity mode, the atoms are loaded into a 
crossed-beam dipole trap formed by one of the transport beams 
and an additional, horizontal dipole beam with a waist radius of 
Wx = wz = 27 um (see Fig. 1). A final stage of evaporative cooling is 
performed by suitably lowering the laser power to final trapping 
frequencies (5 @y z) = 2m X (290, 43, 277) Hz, ending up with 
an almost pure condensate of 2.2 X 10° atoms. 

The ultrahigh-finesse cavity has a length of 176 um and consists of 
symmetric mirrors with a 75 mm radius of curvature, resulting in a 
mode waist radius of 25 um. A slight birefringence splits the res- 
onance frequency of the empty cavity for the two orthogonal, prin- 
cipal polarization axes by 1.7 MHz. With the relevant cavity QED 
parameters (g, K,y) = 2m X (10.6, 1.3, 3.0) MHz, the system is in the 
strong coupling regime. The length of the cavity is actively stabilized 
using a laser at 830nm (ref. 26). The intracavity intensity of the 
stabilization light gives rise to an additional dipole potential of 
2.4E, ec) With the recoil energy defined as E,.. = h’/(2md7), where m 
is the mass of the atom. The chemical potential = 1.8E,.. of 
2.2 X 10° trapped atoms is comparable to the depth of this one- 
dimensional lattice, so that long-range phase coherence is well 
established in the atomic gas. The 1/e lifetime of the atoms in the 
combined trap was measured to be 2.8 s. 
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To find the eigenenergies of the coupled BEC-cavity system for a 
single excitation, we perform transmission spectroscopy with a weak, 
linearly polarized probe laser of frequency ,. To this end, the res- 
onance frequency of the empty cavity is stabilized to a frequency @,, 
which in general is detuned by a variable frequency 4. = @. — Wa 
with respect to the frequency , of the |F = 1) > |F’ = 2) transition 
of the D, line of *’Rb. The transmission of the probe laser through the 
cavity is monitored as a function of its detuning 4, = @) — @, (see 
Fig. 2). The two orthogonal circular polarizations of the transmitted 
light are separated and detected with single-photon counting mod- 
ules. The overall detection efficiency for an intracavity photon is 5%. 
To probe the system in the weak excitation limit, the probe laser 
intensity is adjusted such that the mean intracavity photon number 
is always below the critical photon number 119 = y7/(2g9") = 0.04. A 
magnetic field of 0.1 G, oriented parallel (within 10%) to the cavity 
axis provides a quantization axis. 

From individual recordings of the cavity transmission as shown in 
Fig. 2 we map out the low-excitation spectrum of the coupled system 
as a function of A, (see Fig. 3). After resonant excitation we do not 
detect an influence on the BEC in absorption imaging for large atom 
numbers (see Fig. 1). For small BECs of the order of 5,000 atoms we 
observe a loss of 50% of the atoms after resonant probing. The nor- 
mal mode splitting at A, = 0 amounts to 7 GHz for o” polarization, 
which results in a collective cooperativity of C= Neg’/(2xy) = 
1.6 X 10°. The splitting for the o component is smaller, because 
the dipole matrix elements for transitions starting in |1, —1) driven 
by this polarization are smaller than those for o*. 

A striking feature of the energy spectrum in Fig. 3 is a second 
avoided crossing at probe frequencies resonant with the bare atomic 
transitions |F = 2)-> |F’ = 1, 2,3). It is caused by the presence of 
atoms in the |F = 2) hyperfine ground state. This avoided crossing 
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Figure 1| Experimental situation. a, 36 mm above the cavity, 3.5 X 10° 
ultracold atoms are loaded into the dipole potential of a vertically oriented 
one-dimensional optical lattice. This trumpet-shaped standing wave has its 
waist inside the ultrahigh-finesse cavity and is composed of two counter- 
propagating laser beams. A translation of the lattice transports the atoms 
into the cavity mode. There, they are loaded into a crossed-beam dipole trap 
formed by a focused beam oriented along the y axis and one of the transport 
beams. b, Almost pure condensates with 2.2 10° atoms are obtained. 
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Figure 2 | Cavity transmission for the o* and co” polarization component. 
The probe laser frequency is scanned at a speed of 25 MHzms_' while the 
cavity detuning is fixed. The original transmission data, recorded with a 
resolution of 0.4 Us, is averaged over 2 ms using a sliding average. A single 
peak for each polarization can clearly be distinguished from the background 
of about 60 dark counts per second. 


is located at a cavity detuning where the eigenenergy branch of 
the BEC-cavity system with no atoms in |F = 2) would intersect 
the energy lines of the atomic transitions |F = 2)—> |F = 1, 2, 3). 
Accordingly, the avoided crossing is shifted by approximately 
Ng/A p = 2m X 1.8 GHz with respect to the intersection of the empty 
cavity resonance with the bare atomic transition frequencies. From a 
theoretical analysis (see Methods), we find the size of the |F = 2) 
minority component to be 1.7% of the total number of atoms. 

Our near-planar cavity supports higher-order transverse modes 
equally spaced by 18.5 GHz, which is of the order of the collective 
coupling gv N in our system. In general, the presence of one addi- 
tional mode with the same coupling but detuned from the TEM, 


Probe detuning A,/2x (GHz) 
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Figure 3 | Energy spectrum of the coupled BEC-cavity system. The data 
points are measured detunings of resonances for o~ (red circles) and o~ 
(black triangles) polarized light. Each data point is the average of three 
measurements with an uncertainty of about 25 MHz. The solid lines are the 
result of a theoretical model (see Methods). Bare atomic resonances are 
shown as dotted lines, whereas the empty cavity resonance of the TEMo 
mode is plotted as a dashed-dotted line. Note the asymmetry in the splitting 
at 4, = 0 caused by the influence of higher-order transverse modes. 
Neglecting this influence, the eigenenergies shown by the dashed lines would 
be expected where the free parameters were adjusted to fit the spectrum for 
A, <0. 
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mode by 4; would shift the resonance frequencies at 4, = 0 to first 
order by Ng’/(24,). In our system, this results in a clearly visible 
change of the energy spectrum with respect to a system with a single 
cavity mode only (see dashed lines in Fig. 3). This can be seen as a 
variant of the “superstrong coupling regime””*, in which the coup- 
ling between atoms and the light field is of the order of the free 
spectral range of the cavity. 

We describe the BEC-cavity system in a fully quantized theoretical 
model (see Methods), which yields the eigenenergies of the coupled 
system. Good agreement between the measured data and the model is 
found (see Fig. 3) for 154,000 atoms in the |1, — 1) state and for 2,700 
atoms distributed over the Zeeman sublevels of the | F = 2) state, with 
the majority in |2, — 1). The substantial influence of the higher-order 
transverse modes is modelled for simplicity by the coupling of the 
BEC to one additional effective cavity mode. 

To test the square-root dependence of the normal mode splitting 
on the number of atoms in the BEC, a second measurement was 
conducted. We set the cavity frequency to 4. =0 and record the 
detuning of the lower coupled state from the bare atomic resonance 
|F= 1) |F = 2) as a function of the number of atoms as displayed 
in Fig. 4. The atom number was varied between 2,500 and 200,000, 
determined from separately taken absorption images with an esti- 
mated statistical error of +10%; possible systematic shifts are esti- 
mated to be within +7%. The dependence of |4,| on the number of 
atoms is well described by a square root, as expected from the Tavis— 
Cummings model (dashed lines). However, for a weakly interacting 
BEC the size of the atomic cloud—and thus the spatial overlap with 
the cavity mode—depends on the atom number. Our more detailed 
model, which includes this effect, as well as the influence of higher- 
order cavity modes, yields maximum single-atom couplings of 
Bot = 2 X (14.4 + 0.3) MHz and g,- = 2m X (11.3 + 0.2) MHz for 
the two polarization components (solid lines). The ratio of these two 
couplings is 1.27 + 0.03 and agrees with the ratio of 1.29 that is 
obtained from the effective Clebsch—-Gordan coefficients for the o* 
and o transitions starting in state |1, —1). 
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0.0 0.5 1.0 1.5 2.0 
Atom number N (105) 


Figure 4 | Shift of the lower resonance of the coupled BEC-cavity system 
from the bare atomic resonance. The cavity was locked at 4, = 0. 0” and 
© polarization are shown as red circles and black triangles, respectively. 
Each data point is the average of three measurements. The atom number was 
determined separately from absorption images with an assumed error of 
+10%; the vertical error bars are too small to be resolved. The dashed lines 
are fits of the square root dependence on the atom number, as predicted by 
the Tavis-Cummings model. The solid lines are fits of a more detailed 
theoretical model (see Methods) resulting in maximum coupling rates 

Bot = 2m X (14.4 + 0.3) MHz and g,- = 2n X (11.3 + 0.2) MHz. The ratio 
of the two coupling rates of 1.27 + 0.03 agrees with the expected ratio of 1.29 
of the corresponding Clebsch—Gordan coefficients. 


270 


NATURE|Vol 450|8 November 2007 


The coupling of a single mode of a matter-wave field to a single 
cavity mode opens a route to new experiments. It facilitates the mani- 
pulation and study of statistical properties of quantum-degenerate 
matter-wave fields by a quantized optical field, or even the generation 
of entanglement between these two fields*'”’. The detection of 
single atoms falling through the cavity has already been demon- 
strated with this set-up”. In principle, the detection of small impurity 
components embedded in a large BEC presented here can also be 
extended to single atoms. This is an important step towards the 
realization of schemes aiming at the cooling of qubits immersed in 
a large BEC”. 


METHODS SUMMARY 

Optical transport. The transport of the atoms into the cavity is accomplished in 
T= 100 ms with a maximum acceleration of a= 22.4ms_ ~. The standing wave 
used to transport the atoms has its waist (w,, wy) = (25, 50) um centred inside 
the cavity and is locked to a '*°Cs resonance at a wavelength of 852nm. The 
intensity and frequency of each beam are precisely controlled by acousto-optical 
modulators, which are driven by two phase-locked, homebuilt direct digital 
synthesis generators. The frequency difference 6 between the two counter- 
propagating waves follows 6(t) = [1 — cos(2nt/T)]Omax/2, with a maximum 
detuning of dnax = 1,670 kHz. With the maximally available power of 76 mW 
per beam the trap depth at the position of the magnetic trap is 1.1 uK. During 
transport, the power in the laser beams is kept constant until the intensity at the 
position of the atoms has increased by a factor of ten. Subsequently, this intensity 
is kept constant. 

Theoretical model. To gain understanding of the presented measurements we 
have developed a fully quantized theoretical description of the coupled BEC- 
cavity system. Our model includes all Zeeman sublevels in the 5751). and 5°P3,> 
state manifolds of *’Rb and both orthogonal polarizations of the TEMoo cavity 
mode. For simplicity, the effect of higher-order transverse cavity modes is mod- 
elled by the effective coupling to one additional mode which is detuned from the 
TEMoo mode by A; = 27 X 18.5 GHz. The free parameters of the model are the 
coupling strength between this effective mode and the BEC and the population of 
the several ground states in the condensate. Good agreement with the measured 
energy spectrum is found for the ground-state population given in the main text 
and a coupling between BEC and effective cavity mode that is r= 1.2 times the 
coupling to the TEMo9 mode. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


To characterize the coupled BEC-cavity system theoretically, we start with the 
second-quantized hamiltonian describing the matter—light interaction in the 
electric-dipole and rotating-wave approximation”. We take all hyperfine states 
including their Zeeman sublevels in the 57S,,2 and 5?P3,9 state manifolds of °’Rb 
into account and describe the cavity degrees of freedom by two orthogonal linear 
polarizations of the TEM, mode. We choose the quantization axis, experiment- 
ally provided by a small magnetic field, to be oriented parallel to the cavity axis. 
The near-planar cavity supports higher-order transverse modes equally spaced 
by 4, = 2x X 18.5 GHz. To incorporate the coupling to all higher-order trans- 
verse modes we include one additional effective cavity mode in our model with 
its resonance frequency shifted by 4, with respect to that of the TEMo9 mode. 

Considering only one spatial atomic mode for the ground-state manifold and 
another one for the excited-state manifold, the hamiltonian of the uncoupled 
system reads 


1 
Fy = So hg, 8 & + S~ ho, + SS hoxa, xp 
i j k=0 p=>.1 
where the indices i and j label the states 5°S,). |F, mp) and 5°P3,. |F’, mp’), 
respectively. The operators g (or g;) and é (or @) create (or annihilate) an 
atom in the mode of the corresponding ground and excited states with frequen- 
cies Wg, and @,,. The operators aj, p (or ap) create (or annihilate) a photon with 


energy /iw, and linear polarization p in the cavity mode k, where k = 0, 1 labels 
the TEMo9 mode and the additional effective cavity mode, respectively. 

The coupling between the BEC and the cavity is described by the interaction 
hamiltonian 


1 
Hin = —ih > > oe ge Af, p& +h.c. 
k=0 p=>,t ij 
where aye denotes the coupling strength for the transition i— j driven by the 
cavity mode k with polarization p, and h.c. is the hermitian conjugate. For the 
TEMoo mode the coupling strength gi depends on the dipole matrix element 
Di for the transition i— j driven by the polarization p, the mode volume Vo, and 
the overlap U9 between the two spatial atomic modes and the TEMoo mode: 


ho 
Op pe, | y 
Si y 2&0 Vo uy 


We numerically calculated the ground state of the Gross—Pitaevskii equation in 
the potential formed by the dipole trap and the cavity stabilization light for 
N=2%X 10° atoms, and found the spatial overlap between BEC and TEMoo 
mode to be Up = 0.63. Because of the position uncertainty of the BEC relative 
to the cavity mode, the overlap might deviate by up to 20% from this value. 
The repulsive interaction between the atoms leads to a slight decrease in Uo 
with increasing atom number, which was found numerically to follow 
Up(N) = V0.5(1 — 0.0017N°**). For the coupling strength to the additional 


5 ; 1 Op : 
effective cavity mode we assume gi = gi , with r being a free parameter. 


The initial atomic population of the several ground states i given by the free 
parameters N; is relevant for the form of the energy spectrum of the coupled 
BEC-cavity system. To find the energy spectrum in the weak excitation limit we 
can restrict the analysis to states containing a single excitation. These are the 
states |1 kp Np ws Nos 0), where one photon with polarization p is present in the 
mode kand all atoms are in the ground-state manifold, and also all possible states 
0x, ps Nj, «0 Nj- 1, «5 Nes 1); where no photon is present and one atom was 
transferred from the ground state i to the excited state j. Diagonalization of the 
hamiltonian H = Hp + Hin: in the truncated Hilbert space spanned by these states 
yields the eigenspectrum of the coupled system as a function of the cavity detun- 
ing A.. The relevant transition energies with respect to the energy of the initial 
ground state are plotted in Fig. 3. 

The detuning of the lower resonance branch at 4. = 0 is a function of the atom 
number and given by 


(Uo(N) rg, YN 
2A, 


Fitting this dependence to the data in Fig. 4 yields the maximum coupling 
strengths g,+ for the two polarization components o~. 


|4,| =Uo(N)g.e WN + +0(1/A?) 
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Strong atom-field coupling for Bose-Einstein 
condensates in an optical cavity on a chip 


Yves Colombe’, Tilo Steinmetz’*, Guilhem Dubois’, Felix Linke't, David Hunger” & Jakob Reichel’ 


An optical cavity enhances the interaction between atoms and light, 
and the rate of coherent atom—photon coupling can be made larger 
than all decoherence rates of the system. For single atoms, this 
‘strong coupling regime’ of cavity quantum electrodynamics’” 
has been the subject of many experimental advances. Efforts have 
been made to control the coupling rate by trapping** the atom and 
cooling>* it towards the motional ground state; the latter has been 
achieved in one dimension so far’. For systems of many atoms, the 
three-dimensional ground state of motion is routinely achieved’ in 
atomic Bose-Einstein condensates (BECs). Although experiments 
combining BECs and optical cavities have been reported recently*”, 
coupling BECs to cavities that are in the strong-coupling regime for 
single atoms has remained an elusive goal. Here we report such an 
experiment, made possible by combining a fibre-based cavity’ 
with atom-chip technology''. This enables single-atom cavity 
quantum electrodynamics experiments with a simplified set-up 
and realizes the situation of many atoms in a cavity, each of which 
is identically and strongly coupled to the cavity mode’”. Moreover, 
the BEC can be positioned deterministically anywhere within the 
cavity and localized entirely within a single antinode of the stand- 
ing-wave cavity field; we demonstrate that this gives rise to a con- 
trolled, tunable coupling rate. We study the heating rate caused bya 
cavity transmission measurement as a function of the coupling rate 
and find no measurable heating for strongly coupled BECs. The 
spectrum of the coupled atoms-cavity system, which we map out 
over a wide range of atom numbers and cavity—atom detunings, 
shows vacuum Rabi splittings exceeding 20 gigahertz, as well as an 
unpredicted additional splitting, which we attribute to the atomic 
hyperfine structure. We anticipate that the system will be suitable 
as a light—-matter quantum interface for quantum information”. 
The interaction of an ensemble of N atoms with a single mode 
of radiation has been a recurrent theme in quantum optics at least 
since the work of Dicke'*, who showed that under certain conditions 
the atoms interact with the radiation collectively, giving rise to new 
effects such as superradiance. Recently, collective interactions with 
weak fields, with and without a cavity, have become a focus of 
theoretical and experimental investigations, especially since it 
became clear that they can turn the ensemble into a quantum mem- 
ory’*'>. Such a memory would become a key element for processing 
quantum information'*”® if realized with near-unit conversion effi- 
ciency and long storage time. The figure of merit determining the 
probability of converting an atomic excitation into a cavity photon (a 
‘memory qubit’ into a “flying qubit’) is the collective cooperativity 
Cy =, /(2ky), where gy is the collective coupling strength’’ 
between the ensemble and the field, 2% is the cavity photon decay 
rate and 2y the atomic spontaneous emission rate. (Up to a factor of 
order 1, Cy is the single-pass optical depth of the atomic sample 
multiplied by the cavity finesse F.) For weak excitation, to which 


we restrict ourselves throughout this Letter, gy = JN&, where 
ZH [= lg(x)|" dr, gi(r) is the position-dependent single-atom 
coupling strength and p(r) is the atomic density distribution: the 
ensemble couples to the mode as a single ‘superatom’ with a coupling 
strength increased by WN. In the strong-coupling regime gy > K,), 
which is realized in our cavity even for N= 1, the atomic ensemble 
oscillates between its ground state and a symmetric excited state 
where a single excitation is shared by all the atoms. 
The coupled atoms—cavity system has dressed states of energies 


3 (4c Act 4) (1) 


where 4c = Mc — Ma, Mc and wa, being the cavity and atom res- 
onance frequencies. When the cavity is tuned to atomic resonance 
(Ac = 0), these states are separated by the vacuum Rabi frequency'’” 


Ex hoy + 


2gy. The collective interaction and WN scaling are not a consequence 
of atomic quantum statistics, but apply to thermal and quantum 
degenerate bosons and fermions, and even to non-identical particles 
having the same transition moment”"’, for a wide range of conditions 
in which interparticle correlations in the initial atomic state are neg- 
ligible (see Supplementary Information). Nevertheless, the spatial 
coherence, fundamentally lowest kinetic energy and smallest size of 
a BEC influence its interaction with light, and make it the most 
desirable atomic state in many situations of conceptual and practical 
interest: the BEC makes it possible to maximize the coupling and 
avoid decoherence effects associated with spatial inhomogeneities 
and with atomic motion’. In the free-space case, some of these 
aspects have been shown in detail for the case of superradiance, where 
reduced Doppler broadening in the condensate increased the coher- 
ence time by a factor of 30 over a thermal cloud at the transition 
temperature, making the effect observable only in the BEC”. 

Here we take advantage of the fact that the BEC has the smallest 
possible position spread in a given trap. This allows us to load the 
BEC into a single site of a far-detuned intracavity optical lattice. By 
choosing the lattice site, we achieve well-defined, maximized atom— 
field coupling in the standing-wave cavity field, where the local coup- 
ling varies as g\(x%,r, ) = gocos(2mx/Ac)exp(— r,°/w’) (here xand r, 
are respectively the longitudinal and transverse atomic coordinates, 
and wand Ac = 2nc/w¢ are respectively the mode radius and wave- 
length). This is an important improvement for applications such as 
the quantum memory. Furthermore, BEC-cavity quantum electro- 
dynamics (QED) experiments such as ours and a simultaneous 
similar one’’ can be used to study BECs in the regime of very small 
atom numbers where the mean-field approximation breaks down, 
and may allow observation of effects such as a predicted slight 
modification of the refractive index of the atomic sample close to 
the BEC transition”, and differences between quantum phases in 
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Figure 1| Experimental set-up. a, Layout of the atom chip. “A” indicates 
the location of the first magnetic trap, loaded from a magneto-optical trap. 
b, Close-up view of the two fibre Fabry—Perot (FFP) optical cavities that are 
mounted on the chip. Cavity modes are drawn to scale in red. The BEC is 
produced in a magnetic trap and positioned in the FFP1 mode (‘B’). 


transmission spectra of cavities containing a degenerate gas in an 
optical lattice’’. 

We have developed a novel type of fibre-based Fabry—Perot (FFP) 
cavity’®”, which achieves large single-atom peak coupling rates go 
through reduced mode volume and high mirror curvature (see 
Methods), without the difficulties associated with evanescent fields 
in microtoroidal** or microsphere cavities. The set-up is shown in 
Fig. 1. The combination (go) = 2m X 215 MHz, « = 2m X 53 MHz, 
y = 2n X 3 MHz) places our cavity in the single-atom strong coup- 
ling regime, and leads to a high single-atom cooperativity Co = 145. 
Despite its finesse, which is an order of magnitude below that of 
standard cavities used in cavity QED, the performance data of our 
cavity are comparable or superior to most of those, while all the 
dynamics occurs on a faster timescale. Two laser beams are coupled 
into the cavity through the input fibre—a weak tunable probe beam 


LETTERS 


150 um 


c, Geometry of the FFP1 cavity. d, Overlay of three CCD time-of-flight 
(TOF) absorption images, showing the anisotropic expansion of a BEC 
having interacted for 50 ms with the cavity field under conditions similar to 
Fig. 4. The optical fibres are outlined for clarity. 


(frequency @,), and an optional far-detuned beam at Ap = 830.6 nm 
used to form a one-dimensional optical lattice along the cavity axis. 

Using a combination of chip currents and external magnetic fields, 
a BEC or cold thermal cloud of *’Rb atoms in the |F = 2,mp = 2) 
ground state is prepared inside the cavity, and then positioned 
anywhere within the cavity mode. The slow axis of the magnetic 
trap can be oriented along the cavity axis x or perpendicular to it 
along y. Atom-field interaction is studied either directly in this 
trap, or after switching on the far-detuned lattice to increase the 
confinement and gain control over the coupling rate. For magnetic 
trapping and also for weak lattices, we are able to observe an intact 
BEC after its interaction with the cavity field (Fig. 1d). For lattices 
with trapping frequency above v,~ 20 kHz, technical fluctuations in 
the lattice potential’, possibly due to laser intensity noise, heat up the 
condensate. 
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Figure 2 | Control of the coupling along the resonator axis. The BEC is 
brought to a position x, on the cavity axis, and loaded into the optical lattice. 
a, The probe laser (A, = 780.2 nm) and optical lattice (Ap = 830.6 nm) 
standing waves in the cavity have a variable overlap with 6.4 |1m period. The 
green bar (top) indicates the region of the measurement in c. b, The loaded 
atoms show a strongly modulated coupling 3.4 GHz = gn(x,)/27 = 6.6 GHz, 
depending on the local overlap between lattice and probe field. The rapid 
decrease of gy at the extremities of the mode is probably due to atom loss 
caused by collisions with the mirrors. The pictogram inset shows the 
orientation of the magnetic trap relative to the cavity mode. Probe intensity 


corresponds to a mean intracavity photon number for a resonant cavity 
without atoms of n,.,~ 5.5 X 10°”. (Here and in the following experiments, 
we have checked that intensity-dependent peak shifts are negligible.) 
Maximum transmission in the figure corresponds to an intracavity photon 
number n ~ 1.1 X 10. ¢, The transmission of the cavity, probed at 

Ay = Ac = 2m X —100 GHz with position increments 5x, = 40 nm, exhibits 
well-separated steps owing to the loading into successive single lattice sites. 
Vertical lines indicate the intensity minima of the optical lattice. Horizontal 
lines are expected transmission values with atoms localized in successive 


single lattice sites. Probe intensity: nye, ~ 3.3 X 10 7. 
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In the following, we describe three experiments that explore the 
main aspects of atoms-field interaction in our system. First, we study 
the position dependence of gj(x) and show that a full control is 
achieved. Second, we observe the dependence of gy on atom number, 
and map out the energies of the dressed states. Finally, we investigate 
the heating of a condensate by the intracavity field. 

To study the position dependence of g(x), we start by placing a 
BEC containing N ~ 1,000 atoms at a position x, on the cavity axis in 
a magnetic trap oriented along y (vx, = 2.7 kHz, v, = 230 Hz, bias 
field B, ~ 1G). We then ramp up a tight optical lattice with trapping 
frequencies v, = 50 kHz, v,,,= 2.4kHz. The loaded atoms are now 
strongly confined in the combined trap, even though no longer Bose- 
condensed owing to the technical heating. As the lattice and the 
probed cavity mode have different wavelengths Ap and 4,, their 
overlap is modulated with a period ApA,/2(p — AL) = 6.4 um 
(Fig. 2a). We measure gy(x,) by sweeping the probe laser detuning 
Ay, = @, — MO, = 2m X (4+ 1...—13) GHz in 50 ms, with cavity detun- 
ing dc =@c— @,=0. A transmission peak occurs at |A;| = gy 
when the lower dressed state is excited. Figure 2b shows the result 
for x, values spanning the full cavity length. We are able to reproduce 
the observed gy(x,) by calculating the coupling of a gaussian cloud 
centred on a single lattice site. The corresponding fit (red line on 
Fig. 2b) using the cloud diameter 20, as a free parameter gives 
the value 20,=130nm, from which we deduce 20,= 2.7 um, 
20, = 1.8 um based on the known ratio of the trapping frequencies 
and assuming thermal equilibrium (T = 4.4 uK). This fit gives a good 
indication of o,, but does not prove single-site loading because a 
similar fit is obtained when considering several clouds in adjacent 
sites, each with 20, = 130nm. 

To demonstrate unambiguously the transfer into a single lattice 
site, we measure the transmission as a function of x, in the dispersive 
regime, 4, = Ac = 2m X — 100 GHz, where it depends on the atoms- 
induced cavity resonance shift 8«@¢ = gy’/Ay. If and only if single-site 
loading is achieved, transmission should change in steps between x, 
values corresponding to adjacent lattice sites. We use reduced incre- 
ments dx, = 40 nm, a BEC with N~ 600 and a magnetic trap with 
Vx.z = 4kHz, vy = 230 Hz. Using analytical formulas” for the one- 
dimensional/three-dimensional crossover regime applying to this 
BEC, we find a central radial diameter 20,,, = 2 X 1.3704, = 330 nm, 
where dj, is the harmonic-oscillator ground state radius. This BEC is 
loaded into a lattice with v,. = 100 kHz, v,,, = 4.8 kHz (destroying the 
condensate, but producing stronger confinement along x), and the 
magnetic potential switched off. Figure 2c shows the transmission of 
the cavity. Each point is averaged over two experimental runs. Well- 
separated plateaus are observed, corresponding to discrete values of 
gn in good agreement with the calculated transmissions for ensem- 
bles localized in a single lattice site with diameter 20, = 100nm 
(horizontal lines). This experiment shows the deterministic transfer 
of the atom cloud into successively addressed single sites of the lattice, 
each of which is differently coupled to the cavity. 

A crucial feature of the collective coupling is its scaling with atom 
number. Early strong-coupling cavity QED experiments measured 
this in an atomic-beam apparatus with fluctuations both in Nand in 
the atomic spatial positions’. In our experiment, a BEC or a strongly 
confined cold thermal cloud minimize position fluctuations and N 
remains fixed (as long as the interaction time is short enough to 
induce no significant losses, which is fulfilled here). We vary N by 
forced evaporation, which produces BECs for N< 3,000. The trans- 
mission is measured at 4c =0 while sweeping the probe laser, 
Ay = 2n X (0...£13)GHz. N is determined independently by 
absorption imaging (which underestimates the number of atoms, 
see Methods). We can either perform the measurement after trans- 
ferring the atoms into a combined trap (same parameters as Fig. 2b), 
or directly in the magnetic trap, without an optical lattice. The com- 
bined trap destroys BECs, but increases the coupling when loading 
large thermal clouds; BECs confined in the magnetic trap alone 
remain intact after the measurement. The small, but measurable, 
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difference shown in Fig. 3a between the BEC (circles) and thermal 
cloud (greyscale) spectra is in accord with what is expected from the 
different density distributions (see Methods). The vacuum Rabi split- 
ting reaches 2gy = 2m X 24GHz for N~ 7,000, corresponding to 
Cy = 4.4 X 10°. For N< 1,000, the dressed state frequencies E+ /h 
have the expected + VNg, dependence, with g, ~2n x 200 MHz. The 
slower increase of the coupling for higher N in the thermal cloud 
spectrum is due to the growing size of the sample. The anticrossing 
for N~ 2,000 can be understood qualitatively on the reasonable 
assumption that a few atoms are in the |F= 1) ground state. Atoms 
in |F = 2) have transitions to the upper and lower dressed states at 
frequencies @, + gn, whereas |F= 1) atoms have a transition at 
Ma + Ayrs, Jups = 2 X 6.8 GHz being the ground state hyperfine 
splitting of °’Rb. An anticrossing appears when the transition fre- 
quencies coincide at gy = Ayrs. However, this simple model, as well 
as the model used in ref. 12, cannot explain why the anticrossing 
occurs at larger detuning A, ~ 2m X 8.5 GHz (see Supplementary 
Information). A complete understanding of this effect requires fur- 
ther investigation. 

Figure 3b shows a measurement of the complete dressed-state 
spectrum (equation (1)). Conditions are as in Fig. 3a, but now 
N= 750 is held constant and Ac is varied. Again, the observed reso- 
nances are in good agreement with the expected eigenfrequencies 
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Figure 3 | Map of the energies of the dressed states. a, N is varied for 
Ac = 0. Greyscale spectra were taken on non-condensed atoms, circles were 
measured with BECs. The blue and red curves are the expected resonances at 
AL=t VN Zi, where g, is fitted to the non-condensed data for N < 1,000. 
Inset, the pictogram shows the orientation of the trap. b, 4¢ is varied for 
constant N ~ 750; the empty cavity transmission (N = 0) is also recorded 
and superimposed for reference. The green lines indicate experimental runs 
in a and b with common parameters N = 750, 4c = 0. Probe intensity: 

Nres ~ 1.7 X 10 | (a) and n,., ~ 1.5 X 10 | (b), leading to n ~ 2.9 X 10 * 
(a) and n~ 5.5 X 10 7 (b) with atoms, at transmission peak. 
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Figure 4 | Cavity-induced heating of the BEC at different transverse 

positions for 4, = 4¢ = 0. The pictogram at top right shows the orientation 
of the trap. gy is varied by positioning the BEC at different heights z, relative 
to the cavity axis. The BEC causes a drop in the cavity transmission (line with 
shading under) when it is sufficiently coupled to the mode. The r.m.s. size 


E/h plotted for %=2n x 200 MHz. For large cavity detunings 
|Ac| >> gn, the dressed states of the system evolve towards the 
uncoupled states |g:N — l;e:1;1=0), where a single excitation is 
shared by all the atoms, and |g:N;e:0;1= 1), for which the cavity 
contains a photon. (Here |g) and |e) are respectively the atomic 
ground and excited states, and n is the intracavity photon number.) 

Finally, we measure the heating of a BEC caused by the intracavity 
field. To minimize technical heating, we use a purely magnetic trap 
(Vx = 230 Hz, v,,, = 2.0 kHz). The BEC contains N ~ 800 atoms, and 
WE USC Mes ~ 3.9 X 10 *, 4, = 4c = 0, and an interaction time of 
10ms. The BEC remains intact after the interaction with no mea- 
surable loss or heating if we maximize the coupling by positioning it 
on the cavity axis. We can vary the coupling by positioning the BEC at 
different heights z, relative to the centre of the mode. Figure 4 shows 
that the cavity transmission quickly drops to zero as the coupling 
increases, because the strongly coupled system is no longer resonant 
for Ay, = Jc = 0. (Note that the probe light is then mostly reflected 
from the non-resonant cavity, not scattered by the atoms.) The heat- 
ing rate of the condensate is measured using time-of-flight imaging 
after the interaction (Fig. 4). It exhibits two peaks corresponding to 
measurements where gy is high enough for the atoms to be excited, 
but still low enough to allow a non-zero intracavity field. In these 
regions the BEC is destroyed after the interaction, whereas it is left 
unaffected for |z,|>18m and |z,|<5,1m, corresponding to 
gn <2n X10 *MHz and gy>2nX 10°MHz. gy is calculated 
from z, and the measured N, averaging over the standing wave. 
The observed heating rate can be accounted for with a simple 
momentum-diffusion model**”’ (see Supplementary Information). 
This model predicts a peak heating rate about twice the observed 
value of ~400 cycles s~' per atom, which is a satisfactory agreement 
given the uncertainty of our atom number calibration. For maximum 
coupling, the model yields a heating rate of 3 X 10-*cycless”' per 
atom, which means that the whole BEC scatters an average of 0.24 
photons during the interaction time. 

In the experiments reported here, the small size of BECs was essen- 
tial to produce an atomic ensemble with an extremely large, well- 
controlled, and homogeneous coupling rate. Although the quantum 
degeneracy appears to leave no trace in the interaction with photons, 
our system is well-suited to directly study BEC atom statistics®. 
The cavity should allow quantum non-demolition measurement of 
the BEC atom number”, and can be used as a single-atom detector 
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along x (filled circles) and z (open circles) of the atom cloud after a 2.8 ms 
TOF shows two heating peaks when the BEC is positioned such that 

gn ~ 2m X 10 MHz, and no detectable heating for large coupling 

(gn > 2m X 1 GHz). Insets, examples of TOF images. 


with high quantum efficiency. Such a detector is sensitive to the 
internal atomic state and therefore highly suitable for use as a qubit 
detector. In the regime of atomic ensembles, compared to quantum 
light—matter interface experiments using non-condensed atoms in a 
cavity’®, the BEC additionally offers collisional interaction between 
atoms that can reach large, well-defined values and can be used as a 
resource~’. Proposals exist, for example, to use Raman transitions for 
transferring a small, exactly known number N, of BEC atoms into a 
different internal state*’. With the addition ofa transverse laser beam, 
the cavity could be used to convert such state into a N.-photon Fock 
state. Another, more technical advantage is the inherent fibre coup- 
ling of our cavities. We expect all of these properties to become 
important in future experiments. 


METHODS SUMMARY 


Our set-up features two FFP cavities mounted on an atom chip with 150 um 
distance between their optical axes and the chip surface (Fig. 1). Both cavities 
are tunable independently over a full spectral range with piezoelectric 
actuators. Cavity FFP1 used in the experiments has length d= 38.6 um, 
waist radius wo=3.9um, finesse F = 37,000 and field decay rate 
«= 1c/(2F d) =2n x 53MHz. The calculated maximum single-atom coupling 
rate is go = 2m X 215 MHz, yielding a resonant saturation photon number 
ty = y7/(2g9°) = 1X 10-4. The transmitted probe and lattice beams are sepa- 
rated after the output fibre, and the probe beam is detected with a photon- 
counting avalanche photodiode. The probe laser can be swept continuously over 
a range A, = My — Ma = £2n X 15 GHz, where wa is the frequency of the 
5S1/2|F = 2)>5P3/2|F’ = 3) transition of °’Rb at 2,4 = 780.2 nm. BEC prepara- 
ion is similar to our previous work. Absorption imaging inside and below the 
cavity is used for temperature and atom number measurements. Several effects 
lead to an underestimation of N in these images; atom numbers extracted from 
he cavity QED measurements by using calculated g, values are systematically 
higher by about a factor of 2. 


Full Methods and any associated references are available in the online version of 
he paper at www.nature.com/nature. 
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METHODS 
FFP cavity. For a two-level atom in a symmetric Fabry—Perot cavity (mirror 


327 cp 6Acy 
wwoad  \V ndV2dr—a and 


C= 445 = a> where wo is the mode waist radius (which depends on 
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distance d and radius of curvature 1), g 


the mirror distance and curvature) and F~ Tt I is the cavity finesse, which 


depends solely on the intensity transmission T and loss L of each mirror, but 
not on geometry. The second expression for Cy shows that short cavities with 
strongly curved mirrors lead to high cooperativity. In our FFP cavity'®, the 
concave mirror surfaces are realized on the cleaved surfaces of two optical fibre 
tips facing each other (Fig. 1b, c). With this type of cavity, the optical axis can 
approach the chip surface to half a fibre diameter, and maximum coupling is 
achieved by placing the atoms in the gap between the mirrors, so that their 
distance from any material surface remains in the many-micrometre range, 
where long coherent trapping times have been demonstrated. In the second- 
generation FFP fabrication method used here, a laser surface machining process 
is used to shape the mirror surfaces, which has allowed us to achieve a finesse of 
F = 37,000 for the cavity used here, and should ultimately enable F = 150,000 
(for mirrors where transmission equals total loss), based on the measured surface 
roughness. Additionally, this method can produce very small radii of curvature. 
The FFP1 cavity used here has an asymmetric geometry with radii r; = 450 um 
and r; = 150 1m (measured by atomic force microscopy), and uses a single- 
mode (SM) fibre on the input side, while a multi-mode (MM) output fibre on 
the output side assures high outcoupling efficiency (Fig. 1c). There is a splitting 
between modes of orthogonal linear polarization, which is 540 MHz between the 
TEMO00 modes in our cavity. This allows us to adjust probe beam polarization to 
one particular linear polarization. However, owing to the direct fibre coupling, 
we are currently unable to determine the axis of this polarization in the y-zplane. 
The cavity length d= 38.6 1m (longitudinal mode number n = 99) is confirmed 
by two-frequency transmission measurements, and the waist radius wo = 3.9 um 
is inferred from this d and the mirror curvatures. These parameters lead to the go 
value quoted in the main text. With a Rayleigh length of about 60 jim, the cavity 
mode is quasi-cylindrical: the variation of the beam radius w(x) is 12%. (In the fit 
of Fig. 2b, this variation is taken into account.) The resonance linewidth 2: is 
measured with two frequency-stabilized lasers. The transmission of each cavity 
mirror is T= 31 +2 p.p.m. as determined from a reference substrate coated in 
the same batch as the fibres. From this value and the measured finesse, we infer 
per-mirror intensity losses L = 56 p.p.m. The measured intensity transmission 
from before the input of the SM fibre to after the output of the MM fibre is 0.094 
for a resonant cavity. Comparison to the calculated transmission of the cavity 


2 
(+52) = 0.126 indicates that the combined losses of coupling into the SM fibre, 


mode-matching into the cavity, and from the cavity to the MM fibre are 0.253, a 
low value. The cavity length is actively stabilized to compensate for thermal drifts 
of ~1,500 linewidths. The locking scheme uses a correction signal from FFP2 
which is locked on a resonance and subjected to the same thermal perturbations 
as FFP1; residual drifts of a few linewidths are corrected using an error signal 
derived from the 830 nm lattice beam. 

Atom chip and BEC production. The cavity subassembly is glued onto an atom 
chip, which forms the top wall of a commercial glass cell*'. Sealed fibre feed- 
throughs are formed simply by two slits machined into opposite walls of the cell, 
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which are filled with vacuum-compatible epoxy glue once the fibres are in place. 
Base pressure in the cell is 3 X 10° '° hPa, comparable to the pressure in similar 
cells without cavities. As in our previous work’’, we use a mirror-magneto- 
optical trap (MOT) to precool *’Rb atoms. To avoid obscuring of MOT beams 
by the cavity subassembly, the horizontal MOT beams are parallel to the x axis 
and there is a distance of 11 mm along y between the MOT (position A in Fig. 1a) 
and cavity centres. After optical pumping to the | F = 2,m, = 2) state and initial 
magnetic trapping near the MOT location and magnetic transport (which com- 
bines a wire guide'' and an external quadrupole field), the atoms are further 
cooled by forced radio-frequency evaporation in a ‘dimple’ trap** (formed by a 
wire cross with currents 3 A and 300 mA) between the chip surface and the cavity 
mode at position B (Fig. 1b); the BEC is produced 17 ttm above the mode axis. 
Alternatively, in some experiments, we use surface evaporation**** near one of 
the fibre endfaces, as we found that this increases atom number stability for small 
condensates. The trap geometry is prolate (cigar-shaped); by ramping wire cur- 
rents we can align its long axis parallel or perpendicular to the cavity axis”, as 
required in a particular experiment. Absorption imaging inside and above the 
cavity is possible with a probe beam in the y—z plane, which subtends an angle of 
30° with the chip surface and is reflected by the dielectric coating on the chip 
before its passage through the cavity (Fig. 1d). This reflection limits the achiev- 
able purity of the desired circular polarization; furthermore, camera noise in 
conjunction with the magnification of about 4 forces us to use a relatively high 
imaging beam intensity that causes some saturation. We have not attempted to 
correct these effects, all of which lead to an underestimation of N. Atom numbers 
extracted from the cavity QED measurements by using calculated g, values are 
systematically higher by about a factor of 2. 

BEC coupling strength. In Fig. 3a the circles show the coupling strength gy for 
BECs held in a magnetic trap (,,, = 2.7 kHz, vy = 230 Hz, bias field By~1 G), 
with atom numbers N ranging from 260 to 950. gn is reduced by a factor of about 
0.86 (mean value over the seven measurements) with respect to the values mea- 
sured for thermal samples in the combined trap (same trap parameters as in 
Fig. 2b). The expected reduction in gy is calculated using the density distribu- 
tions of the BECs in the one-dimensional/three-dimensional crossover regime™* 
and the density distribution of thermal clouds at 4.4 1K, as obtained from the fit 
in Fig. 2b. This assumes that the final temperature of the sample is independent 
of N when loading BECs. We find a reduction by a factor 0.83 (mean value), in 
good agreement with the measured value. For these measurements a very weak 
probe beam is used, 1,., = 6.3 X 10 *, in order to leave the condensate un- 
affected. For each of the seven data points, we average over four identical experi- 
mental runs to obtain a clear transmission signal. 
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Stability of organic carbon in deep soil layers 
controlled by fresh carbon supply 


Sébastien Fontaine’, Sébastien Barot’, Pierre Barré’, Nadia Bdioui', Bruno Mary* & Cornelia Rumpel° 


The world’s soils store more carbon than is present in biomass and 
in the atmosphere’. Little is known, however, about the factors 
controlling the stability of soil organic carbon stocks”* and the 
response of the soil carbon pool to climate change remains uncer- 
tain®*®. We investigated the stability of carbon in deep soil layers in 
one soil profile by combining physical and chemical characteriza- 
tion of organic carbon, soil incubations and radiocarbon dating. 
Here we show that the supply of fresh plant-derived carbon to the 
subsoil (0.6—-0.8 m depth) stimulated the microbial mineralization 
of 2,567 + 226-year-old carbon. Our results support the previously 
suggested idea’ that in the absence of fresh organic carbon, an 
essential source of energy for soil microbes, the stability of organic 
carbon in deep soil layers is maintained. We propose that a lack 
of supply of fresh carbon may prevent the decomposition of the 
organic carbon pool in deep soil layers in response to future changes 
in temperature. Any change in land use and agricultural practice 
that increases the distribution of fresh carbon along the soil pro- 
file’*’ could however stimulate the loss of ancient buried carbon. 

The soil reservoir of organic carbon corresponds to 615 GtC in 
the top 0.2 m layer and 2,344 GtC at depths of up to 3m, which is 
more than biomass and atmospheric CO, combined'. The mean 
residence time (MRT) of soil organic carbon (SOC) increases 
strongly with depth, reaching values of 2,000—10,000 yr in deep soil 
layers (>0.2 m)**. However, little is known about the factors con- 
trolling the stability of carbon in deep soil layers. Improved know- 
ledge of these factors is essential to determine whether this pool of 
carbon will react to global change and accelerate the increase in 
atmospheric CO). 

We investigated the stability of carbon in deep soil layers in a soil 
profile located within the research observatory on grasslands set up 
by the French National Institute for Agricultural Research in 2003 
(Massif Central, France). This site has been under grassland for 
>50 yr, and was covered with forests of chestnut and hornbeam 
2,000 yr ago". Radiocarbon ('*C) dating suggests that SOC stored 
in deep layers is derived from these old forests (Table 1). SOC content 
declines with depth, but 77 + 1% (mean + s.e.m.) of the soil reser- 
voir of carbon is below 0.2 m (Supplementary Table 1). The soil is a 
drained Cambisol developed from granitic rock. Cambisols, which 
are relatively rich in deep C and cover 10% of the terrestrial surface, 
are the second most widespread soil type of the world after Leptosols, 
which are poor in deep C"'. 

It has been widely demonstrated that the chemical nature of 
organic compounds may control the intensity of decomposer acti- 
vities and rates of degradation. To test whether the stability of deep C 
is due to its inherent chemical structure, soil samples from the surface 
layer (0-0.2 m) and a deeper layer (0.6—0.8 m) were collected. ~C 
content was measured to date SOC and determine its MRT using a 
model of flux (Methods). The chemical composition of SOC in the 


two layers was analysed by ‘°C CPMAS (cross polarization with 
magic-angle spinning) NMR (nuclear magnetic resonance) and by 
FTIR (Fourier transform infrared) spectroscopy to determine 
whether changes in MRT with depth may result from a shift in the 
chemical composition of SOC. 

The '*C content of SOC declined with depth, from 100.2 + 0.4% of 
modern carbon (MC%) in the surface layer to 77.9 + 0.3 MC% in the 
subsoil. The '*C dating and the calculation of MRT of SOC both gave 
consistent results (Table 1). The surface layer was dominated by 
young fast-cycling carbon (320 + 27yr) whereas the subsoil was 
dominated by ancient slow-cycling carbon (2,560 + 74 yr), often 
referred to as the passive fraction of SOC”’. This result indicates that 
the decomposition of SOC is strongly reduced at depth. 

These differences in MRT were not mirrored by changes in the 
chemical composition of SOC. Figure 1 shows that the '*C CPMAS 
NMR spectra of both layers were similar. They were characterized by 
dominant signals in the O-alkyl C region, which are generally 
assigned to amide C of proteins and to C2, C3, C5 of polysacchar- 
ides’. The resonance centred in the alkyl C region indicated the 


Table 1| Properties of the two soil layers 


Property Layer depth Layer depth 
0-0.2m 0.6-0.8m 
pH 6.1+0.1 6.7+0 
Clay (%) 27+1 3441 
Clay minerals* (O22 6541 
Kaolinite 2e1 2642 
HIV beet 9+1 
Illite 
Oxides 27+0.1 36+0.1 
Fe (gkg *) 6.5+0.2 7.6+0.0 
Al (gkg~*) 
SOC content (gC kg” 1) 3241 233405 
SOC bound to minerals (% of total) 50+0.5 5841 
SOC 81°C (%o) -27.4£04  -25.9+04 
soc C content (MC%) 100.2 +0.4 77.9 £03 
SOC “C-age (yr BP) Modern 2,007 + 31 
SOC MRT (yr) 320 + 27 2,560 + 74 
Root (gC kg +) 3:9:)0'5 0.008 + 0.002 
Root production of fresh litter} (gCkg tyr 4) 43+06 0.009 + 0.002 
POM content (>200 um; gC kg +) 1820.7 0.016 + 0.05 
POM ““C content (MC%) ND 109 +28 
POM MRT (yr) ND 64+ 4,1 
Microbial biomass (mg C kg 1) 853211 193 +22 


Soil properties, clay mineralogy, iron (Fe) and aluminium (Al) oxides, '“C content, “C age and 
mean residence time (MRT) of soil organic carbon (SOC) and particulate organic matter 
(POM). Values are given as mean + s.e.m. (n = 3 for all analyses except '*C analysis of SOC and 
POM, n= 2). MC%, percentage of modern carbon; HIV, hydroxy-interlayer vermiculite; ND, not 
determined. 
* Relative peak area of diffractograms in %. 

+ Root production of fresh litter was calculated as (root density)/(root MRT). Root MRT (0.9 yr) 
was calculated for the same grassland exposed to °C labelled CO> (ref. 28). 

The model yielded two possible MRT, 6.4 = 4.1 yr and 101 + 35 yr, but the latter was excluded 
as it was inconsistent with the MRT calculated as (POM content)/(POM input flux) 
(Supplementary Data 2). 


INRA, UR 874 Agronomie, 234 Avenue du Brézet, 63100 Clermont-Ferrand, France. 21RD, UMR 137, 32 Avenue H. Varagnat, 93143 Bondy, France. 3BIOEMCO, UMR 7618, CNRS- 
INRA-ENS-Paris 6, Batiment EGER, Aile B, 78820 Thiverval-Grignon, France. “INRA, UR 1158 Agronomie, Rue Fernand Christ, 02007 Laon, France. 
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presence of methyl C in long chain aliphatic compounds, derived 
from lipids’’. Both soil layers showed signals in the aromatic region 
of the spectra (C substituted aryl C, and O substituted aryl C), indi- 
cative of C derived from lignin and charcoal. The only significant 
difference, but not quantitatively important, was a higher contri- 
bution of C substituted aryl C in the subsoil (10 + 0.0%) than in 
the soil surface (8.7 + 0.4%) (P<0.02, Supplementary Table 2). 
The FTIR spectra also indicated that the SOC chemical composition 
does not change markedly with depth (Supplementary Data 1), sug- 
gesting that the stability of SOC in the subsoil is not due to the 
chemical structure of SOC itself. 

The deep carbon may persist because it is bound to soil minerals 
and exists in forms that decomposers cannot access'*’°. Table 1 
shows that the proportion of SOC bound to minerals increased 
slightly with depth, from 50+0.5% for the surface layer to 
58 + 1% for the subsoil. Could this 8% change explain the shift in 
MRT of SOC with depth? We used a simple model (Supplementary 
Method 1) to show that the only way to simulate the shift in MRT of 
SOC with depth is to assume a large change in the type of organo- 
mineral associations, that is, organo-mineral complexes at depth 
must be ten times more stable than in the surface. This assumption 
is not supported by our results. Table 1 shows that clay mineralogy, 
which was dominated by kaolinite, did not change markedly with 
depth. Fe and Al oxides and oxyhydroxides, which play a role in the 
preservation of SOC, increased with depth (Fe X1.3, Al 1.2), but 
not to the extent imposed by our model (X10). Thus, the stability of 
SOC in the subsoil cannot entirely be ascribed to SOC fixation on 
minerals. 

The slow SOC decomposition at depth could result from inappro- 
priate conditions for microbes, such as a lack of oxygen. To test 
whether the overall conditions found at depth allow microbial 
activities, we determined MRT of particulate organic matter 
(POM > 200 um). If microbial activities are possible, the small pool 
of root litter should be quickly recycled and dominated by recent C. 
This was confirmed by the results, as the MRT of POM in the deep 
layer was 6.4 + 4.1 yr, in sharp contrast with SOC (Table 1). 

Under these circumstances, how can deep SOC escape from 
microbial degradation? A new theory of SOC dynamics’ has pro- 
posed that the slow SOC turnover at depth results from scarcity of 
fresh C (plant litter and exudates). Based on many empirical 
results'°*°, this theory assumes that soil humus is the result of the 
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Figure 1| °C CPMAS NMR spectra of soil carbon for the two soil layers. 
The chemical shift regions 0—45 p.p.m., 45-110 p.p.m., 110-140 p.p.m., 
140-160 p.p.m. and 160-220 p.p.m. were referred to respectively as alkyl C, 
O-alkyl C, C substituted aryl C, O substituted aryl C and carboxylic C. 


278 


NATURE| Vol 450|8 November 2007 


long-term accumulation of biochemically recalcitrant compounds 
having low energy content. Near the soil surface, microbes are able 
to decompose these compounds with their enzymes because they use 
fresh C as source of energy. In deep soil layers, however, fresh-C 
inputs by plants are extremely low (for example, fresh-litter depos- 
ition by roots was 478 times lower in our subsoil than in the surface 
layer; Table 1). Under these conditions, the theory predicts that 
acquisition of energy from recalcitrant compounds cannot sustain 
microbial activity, and soil decomposition is strongly reduced. This 
prediction can be experimentally tested: if the theory is right, then the 
delivery of fresh C to the subsoil should activate mineralization of 
ancient C. 

To test this theory, we incubated soil sampled from the deep layer 
(0.6—0.8 m) with cellulose, which is the main component of plant 
litter. We used a novel technique based on dual labelling of cellulose 
(°C, C) in order to trace decomposition of cellulose C and SOC, 
and to determine the mean age of SOC mineralized by microbes 
(Methods). Soil without cellulose was also incubated as a control. 

Incubation of control soil released CO (Fig. 2a), indicating that 
there were metabolically active microbes in the subsoil and that a 
fraction of deep C was degraded. The '*C dating of CO, produced 
during the incubation showed that microbes in the control soils 
mostly decomposed recent organic matter (Table 2). Nevertheless, 
it is likely that decomposers mineralized a small amount of old C 
together with recent plant litter because the '*C content of respired 
CO) was lower than that of actual atmospheric CO (~105 MC%). 

The addition of cellulose stimulated microbial respiration and 
growth, demonstrating that microbes were limited by energy 
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Figure 2 | Effect of cellulose supply on respiration and biomass of microbes 
of the deep layer. a, b, Cumulative respiration of unlabelled soil carbon 
(a) and total microbial biomass (b). The difference in soil carbon respiration 
between control and cellulose-amended soil represents the priming effect. 
Inset, respiration rate of labelled cellulose. The decrease in respiration after 
day 26 indicates the exhaustion of cellulose. Values are given as 

mean + s.e.m. (n = 3). 
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Table 2 | Properties of unlabelled soil C released during incubation 


Quantity “CE activity MC age 

(mgC kg 4) (MC%) (yr BP) 
Control soil 100 +4 97 +14 222 (+119/-117) 
Soil with cellulose 723: 85+1.6 1,329 (+154/-152) 
Priming effect 72+2 732 2,567 (+226/—219) 


Quantity, '*C activity and '“C age of unlabelled soil C released as CO, by the control, the soil with 
cellulose and the priming effect during the 161 days of incubation of the subsoil. Values are given 
as mean + s.e.m. (n = 3). Standard errors of “C age are asymmetric owing to the exponential 
decay of “C. 


(Fig. 2). The stimulation of decomposers induced a significant 
(P<0.001, analysis of variance, ANOVA) increase in production 
of unlabelled soil-originated CO2, an effect known as priming'*”. 
It has been previously proposed that the priming effect may result 
from decomposition of recalcitrant old SOC by stimulated microbes. 
However, this hypothesis has not been demonstrated”’®. Here, we 
show that the ‘“C content of the CO, derived from SOC decreased 
significantly (P< 0.01, ANOVA) after cellulose supply, demonstrat- 
ing that cellulose-stimulated decomposers degraded very old C 
(Table 2). Calculations indicated that the pool of carbon decomposed 
by the priming effect was 2,567 + 226 yr old. 

Both the total microbial biomass and the priming effect signifi- 
cantly decreased (P< 0.01, ANOVA) with the exhaustion of cellulose 
(Fig. 2). Thus, although decomposers are able to decompose ancient 
C, the acquisition of energy from such substrate is not sufficient to 
sustain long-term biological activity. Mechanistically, this suggests 
that the energy required to break down the recalcitrant SOC (for 
example, extracellular enzyme production) is higher than the energy 
supplied by the catabolism of such substrate. As a result, the long- 
term activity of decomposer populations depends on a permanent 
supply of fresh C (ref. 7). 

The addition of cellulose, at a rate representing about one-quarter 
of the annual fresh litter C deposition into the surface layer by plant 
roots (Table 1), led to the mineralization of 72 + 2 mgC per kg soil 
of old SOC (Table 2). If this cellulose supply and the resulting 
priming effect were repeated each year, we calculated that, under 
laboratory conditions, the MRT of SOC in the subsoil would be 
23,300/72 = 324 yr, which is very close to the MRT found in the 
surface layer (Table 1). 

The present results show that the stability of SOC in the studied 
deep layer reflects a lack of fresh C for microbes. We cannot be certain 
that the limitation by fresh C detected here, with this deep soil, will 
also be the dominant control of SOC stability in other deep soils. We 
therefore encourage research in other soils to quantify the relative 
role of mechanisms studied here at global scale. However, given that 
the breakdown of SOC is limited by the availability of fresh C in most 
soils'**°, and that the low fresh-C availability at depth relies on a 
fundamental property of ecosystems (plants live and incorporate 
most of their litter at the surface), our results can probably be 
generalized to many well-drained deep soils. They cannot apply to 
waterlogged peat soils, because decomposition there is primarily 
constrained by a lack of oxygen. 

Our results have several implications. First, they suggest that bio- 
logical and physical processes that bury recalcitrant SOC below the 
deposits of fresh C protect it from decomposition and allow C storage 
over millennia (Supplementary Fig. 1). This mechanism provides an 
interesting alternative to current approaches that involve short-term 
storage of carbon in vulnerable compartments (plant biomass, sur- 
face SOC)*'. Second, our incubation results suggest that, even under 
favourable conditions of temperature and moisture for microbial 
activities, SOC from the deep soil does not provide enough energy 
to sustain active microbial populations and thereby the production 
of enzymes. The existence of this energetic barrier could reduce or 
cancel the effect of future changes in temperature on the decomposi- 
tion of this large pool of deep C, in contradiction to the predicted 
effect based on the temperature-induced acceleration of enzymatic 
reactions”. Last, our results show that deep SOC decomposition may 
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be reactivated. Changes in land use and agricultural practices (for 
example, deep ploughing versus conservation tillage, use of drought- 
resistant crops with deep root systems) that increase the distribution 
of fresh C at depth'*° could stimulate loss of this ancient buried 
carbon. 


METHODS SUMMARY 

Characterization of the soil profile. Three independent soil samples were col- 
lected in layers of 0.2 m down to the depth at which unweathered parent material 
was encountered. Organic C and bulk density were measured in each layer to 
determine SOC content and storage. We studied the stability of deep SOC by 
focusing on two contrasted layers, that is 0Q-0.2 m and 0.6—0.8 m. A subsample of 
intact soil was used for the determination of POM?’, the remainder was sieved. 
The sieved soil was used to determine pH, clay content, total nitrogen, microbial 
C (ref. 24), age and MRT of SOC, chemical and physical composition of SOC, 
and to conduct the incubation. 

Soil analyses. '*C content of SOC and POM (>200 pm) was measured by liquid 
scintillation counting. The '“C age was calculated with the Libby half-life. To 
determine the MRT of SOC and POM, we used the measured 'C content to 
constrain a flux model”*. The chemical composition of SOC was then analysed by 
CPMAS NMR "°C and by FTIR spectroscopy (Supplementary Data 1). The 
amount of C bound to soil minerals was estimated by the demineralization 
technique’. Clay mineralogy was determined by X-ray diffraction of oriented 
samples. Iron and aluminium oxides and oxyhydroxides were estimated by the 
dithionite-citrate-bicarbonate method”. 

Incubation experiment. Fresh sieved soils were incubated at 20 °C and at a water 
potential of — 100 kPa for 161 days. Dual-labelled cellulose (8°C = 1,860%o, 4C 
activity = 2 MC%) was mixed with half of the incubated soils, 1 g cellulose C per 
kg of soil. The other half without cellulose (control soils) was also mixed to apply 
the same disturbance. The CO; evolved was trapped in NaOH and measured by 
continuous flow colorimetry. '*C and '‘C content of CO, was analysed after 
precipitating the carbonates with excess BaCl). '*C and “C analysis of carbonates 
was carried out by IRMS (isotope ratio mass spectrometry) and AMS (accel- 
erator mass spectrometry), respectively. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Characterization of the soil profile. Three soil samples were collected at a 
distance of 1-2 m from each other, and in layers of 0.2m down to the depth 
at which unweathered parent material was encountered (1m) (Supplementary 
Fig. 1). Each replicate was analysed separately. Organic C and bulk density were 
measured for each horizon to determine SOC content and storage at each depth. 
We studied the deep SOC stability by focusing on two contrasted layers, that is, 
0-0.2 m and 0.6—-0.8 m. A subsample of intact soil was used for the determination 
of POM”, the remainder was sieved (2mm) and visible plant residues were 
removed (sieved soil). After this treatment, POM (>200 um), soluble C and 
microbial C accounted for less than 4.5% of total C in both soil layers, indicating 
that the remaining C is dominated by humified SOC. The sieved soil was used to 
determine pH, clay content, total nitrogen, microbial C*, age and MRT of SOC, 
chemical and physical composition of SOC, and to conduct the incubation 
experiment. All the analyses were made in three replicates per soil layer 
except '*C analysis of SOC and POM that were made in two replicates per soil 
layer. 

Chemical characterization of soil carbon. Soils were treated with 10% HF to 
concentrate organic C and to remove paramagnetic compounds”. Solid-state 
"SC NMR spectra were obtained on a Bruker DSX-200 NMR spectrometer. 
CPMAS was applied at 6.8 kHz. Solid-state '*C NMR signal was recorded as free 
induction decay and Fourier-transformed to yield a NMR spectrum. 

'4C dating and estimation of the mean residence time of soil carbon. Soils were 
treated with dilute HCI before '*C measurements to remove eventual traces of 
carbonates. '“C content of SOC and POM, measured by liquid scintillation 
counting at the Centre de Datation par le Radiocarbone (France), is expressed 
in percentage of modern carbon (MC%), which is the percentage deviation from 
“c/*C ratio of oxalic acid in 1950°°. Fast cycling C (turnover time, years— 
centuries) has value >100MC% because it has incorporated a significant 
proportion of '*C emitted by nuclear bomb testing. Before weapons testing, 
atmospheric '*C was approximately 100 MC%. Slow-cycling C (turnover time, 
millennia) has value <100 MC% because of radioactive decay of '*C. The '*C age 
was calculated with the Libby half-life (5,568 yr) and expressed in years before 
present (yr BP). To determine the MRT of SOC and POM, we used the measured 
4C content to constrain a flux model”*. The '“C content of SOC was modelled as: 


Pp ; os 
» [Mje~ (?—)/MRT x Alte 0-04) 


Agic(0) 


> Myje~(e—)/MRT 

i=b 

where M, is the amount of new SOC input to the soil in year i, MRT the mean 
residence time of SOC, p the year when the soil sample is taken, b the year when 
the simulation starts, Aj* the '“C activity in the atmosphere of year iand / the '*C 
decay rate (1/8,268). See Supplementary Method 2 for more details. 

Carbon bound to soil minerals. Soils were treated with 10% HF to remove 
mineral material and release mineral-associated carbon. The carbon lost upon 
the demineralization procedure is assumed to represent the carbon bound to soil 
minerals”. 

Clay mineralogy, content of iron and aluminium oxides. Clay mineralogy was 
determined by X-ray diffraction (XRD) of oriented samples. The diffractograms, 
shown in Supplementary Fig. 2, were obtained with a Phillips diffractometer 
using Cu radiation. Peak areas of the clay minerals were measured and reported 
in percentages of total peak area to compare the (semiquantitative) XRD dif- 
fractograms. Iron and aluminium oxides and oxyhydroxides were estimated by 
the dithionite-citrate-bicarbonate method”’. 
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Incubation experiment. Experimental units consisted of 60 g (oven-dried basis) 
samples of fresh sieved soils placed in 500 ml flasks and incubated at 20 °C for 161 
days. The moisture content of the soil was adjusted to a water potential of 
—100 kPa with a nutrient solution (NH,NO3, KH;PO,). After 15 days of pre- 
incubation, 1 g cellulose C per kg soil of dual-labelled cellulose (85°C = 1,860%o, 
C activity = 2 MC%) was mixed with half of the incubated soils. The other half 
without cellulose (control soils) was also mixed to apply the same disturbance. 
The CO; evolved was trapped in NaOH and was measured by continuous flow 
colorimetry. 5'°C-CO, was analysed by an elemental analyser coupled to a mass 
spectrometer after precipitating the carbonates with excess BaCl, and filtration. 
Microbial C and 8'°C were determined by the fumigation-extraction tech- 
nique”*. Measurements of '“C-CO, were conducted on a separate set of flasks. 
In this set, the NaOH solutions taken at each sampling date were kept under free 
CO, atmosphere until the end of incubation. These samples were then pooled 
together to produce a single sample which received BaCl. '*C analysis of carbo- 
nates and cellulose was carried out by AMS (Radiocarbon Laboratory of Poznan, 
Poland). 

Calculations. The *C content of CO, emitted in the control soil was trans- 
formed into age BP using the Libby half-life of '*C. Measurement of the '*C 
content of total CO, emitted in soils with cellulose does not allow for the cal- 
culation of the age of SOC released as CO, because of the two sources of carbon 
(cellulose, SOC). We circumvented this problem using dual labelling of cellulose. 
The '°C labelling of cellulose allowed the separation of soil C (Rs) and cellulose 
(Rc) respiration (mg C CO) per kg soil) using mass balance equations: 


RsAs + RcAc = RrAr 


Rs + Re = Rr 


where As is the °C abundance (dimensionless) of soil carbon, Ac the ‘°C abund- 
ance of cellulose, Ry the total CO, emitted by soil with cellulose and Ay its °C 
abundance. 

Then, we calculated the '*C content ( expressed in MC%) of SOC released as 
CO; (Ag) with the equation 


RsAgi + RcAG = Rr Ay! 


where A¢: is the '*C content of cellulose and Aj" the '“C content of the total CO, 
emitted by soil with cellulose. A{* was converted into age BP using the Libby half- 
life of '“C. 

The priming effect (PE, mg C CO) per kg soil) induced by the addition of 
cellulose was calculated as: 


PE=(Rg soil with cellulose) —(Rs control soil) 


We calculated the '*C content of the pool of SOC decomposed via the priming 
effect (Ane) as 
(Rs soil with cellulose)At' —(Rs control soil) Ay} 

PE 


where Aya is the '*C content of CO, emitted in the control soil. A was converted 


14 
Age 


into age BP as previously described. 
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Detection of stratospheric ozone intrusions by 


windprofiler radars 


W. K. Hocking’, T. Carey-Smith'”, D. W. Tarasick’, P. S. Argall’, K. Strong’, Y. Rochon’, |. Zawadzki* & P. A. Taylor? 


Stratospheric ozone attenuates harmful ultraviolet radiation and 
protects the Earth’s biosphere’. Ozone is also of fundamental 
importance for the chemistry of the lowermost part of the atmo- 
sphere, the troposphere’ *. At ground level, ozone is an important 
by-product of anthropogenic pollution’, damaging forests and 
crops”*, and negatively affecting human health’. Ozone is critical 
to the chemical and thermal balance of the troposphere’ because, 
via the formation of hydroxyl radicals, it controls the capacity of 
tropospheric air to oxidize and remove other pollutants’. 
Moreover, ozone is an important greenhouse gas, particularly in 
the upper troposphere’. Although photochemistry in the lower 
troposphere is the major source of tropospheric ozone”’”", the 
stratosphere-troposphere transport of ozone'*’’ is important to 
the overall climatology, budget and long-term trends of tropo- 
spheric ozone***’”. Stratospheric intrusion events, however, are 
still poorly understood. Here we introduce the use of modern 
windprofiler radars” ” to assist in such transport investigations. 
By hourly monitoring the radar-derived tropopause height”*” in 
combination with a series of frequent ozonesonde balloon 
launches, we find numerous intrusions of ozone from the 
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Figure 1| Simultaneous radar and ozonesonde measurements from the 
Montreal Campaign of April-May 2005. a, Altitude—time intensity plot of 
backscattered radar power observed with the McGill windprofiler radar, 
expressed as relative power in decibels. Absolute maximum values of 
backscattered powers occurred in the lower atmosphere, but a secondary 
maximum appeared in the altitude region between 6 and 14km. The lower 
edge (region of largest local power gradient as a function of height) of this 
secondary maximum (green in the figure) is shown as a black line. This 
represents the height of the tropopause, as has been shown in a variety of 
studies**°. Radar data were recorded from 28 April to 11 May, but only a 
subset of the radar data are shown. b, Ozone densities (measured in parts per 


stratosphere into the troposphere in southeastern Canada. On 
some occasions, ozone is dispersed at altitudes of two to four kilo- 
metres, but on other occasions it reaches the ground, where it can 
dominate the ozone density variability. We observe rapid changes 
in radar tropopause height immediately preceding these intrusion 
events. Such changes therefore serve as a valuable diagnostic for 
the occurrence of ozone intrusion events. Our studies emphasize 
the impact that stratospheric ozone can have on tropospheric 
ozone, and show that windprofiler data can be used to infer the 
possibility of ozone intrusions, as well as better represent tropo- 
pause motions in association with stratosphere-troposphere 
transport. 

Ozone enters the troposphere from the stratosphere as part of the 
Brewer—Dobson circulation"’, through episodic events, but details of 
the process are not well known. In this work, we observed a number 
of such events, using a unique combination of windprofiler radars, 
frequent ozonesonde launches and computer modelling. Windpro- 
filer radars at two locations were used to provide tropopause heights 
and vector winds as a function of time, at a resolution of typically one 
hour, as described in the Methods Summary and illustrated in Fig. la. 
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billion) are plotted as a function of height and time for the period 29 April to 
10 May. Each vertical column of coloured boxes represents a different 
launch. The tropopause height as determined by the radar is marked as the 
solid black line at 6-14 km altitude. Regions of rapid tropopause ascent are 
labelled as A, B, C and D. Stratospheric ozone intrusion trajectories are 
highlighted approximately by the hand-drawn pink arrows (A—A’, B-B’, 
C-C’, D-D’), although these should only be taken as a guide, because the 
detailed trajectories are complicated by dispersive processes, by the presence 
of pre-existing tropospheric ozone, and, in cases C and D, by the close 
proximity of the two jumps (making their effects hard to separate). 
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The radars used were located at Montreal, Quebec (45.4°N, 73.9°W), 
and Walsingham, Ontario (42.6°N, 80.6°W). They had steerable 
beams with one-way beam half-power half-widths of 2.1°, which 
could be pointed vertically, or at 10.9° off-vertical in various azi- 
muthal directions. The radio frequencies used were 52.00 MHz 
(Montreal) and 44.50 MHz (Walsingham). The vertical resolution 
was 500 m in each case. 

Ozonesonde balloons were released close to the radars, and profiles 
of ozone concentration, temperature, humidity, pressure, wind speed 
and wind direction were obtained at intervals of typically 8-12 hours. 
In the Montreal case, the launches were made at the Canadian Space 
Agency Head Office in St Hubert, Quebec, while the radar was located 
on the MacDonald campus of McGill University, about 45 km away. 
At Walsingham, launches took place at the radar site. Studies were 
carried out in five campaigns, each about two weeks long. These 
simultaneous measurements of radar tropopause heights, together 
with dynamical parameters, ozone and water-vapour content, were 
supplemented by calculations using a three-dimensional dynamical 
lagrangian particle dispersion and tracking model of ozone transport, 
called FLEXPART**”, which used regional meteorological analyses of 
the Canadian operational weather forecast Global Environmental 
Multiscale (GEM) model” as input. More specifics can be found in 
the Methods Summary, and in the Supplementary Information. The 
results of our study demonstrate the capability of windprofilers to be 
used to understand ozone intrusions better. 

The balloons used in the study carried EN-SCI model 2Z-ECC 
ozonesondes equipped with Global Positioning System (GPS) recei- 
vers and Vaisala RS80 radiosondes. The vertical resolution was about 
100m, and ozone measurement accuracy was about 5%. Over a 
hundred launches were carried out in five campaigns. Water vapour 
content was also measured by the ozonesondes, and low water vapour 
content in the middle troposphere, typically less than 0.2 mb, asso- 
ciated with ozone peaks, was taken as partial evidence that the ozone 
enhancement had a stratospheric origin. Every campaign showed 
evidence of stratospheric ozone intrusions into the troposphere, 
but in the first campaign, high levels of radio interference prevented 
useful radar measurements. Our discussions will therefore focus on 
the subsequent four campaigns. 

Figure 1b shows a height—time ozone-density plot from Montreal 
for April-May 2005, as well as the tropopause height taken from Fig. la. 
The tropopause height closely follows the height at which ozone den- 
sity increases markedly. Excursions of tropospheric ozone density 
above the background values (typically dark blue in Fig. 1b) are our 
main interest here. ‘Background’ ozone is generally defined as tropo- 
spheric ozone that is more than seven days old and therefore of uncer- 
tain origin, and typical values**" are of the order of 20-40 p.p.b. 
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Figure 2 | Radar and ozone data recorded during the first campaign at 
Walsingham in November 2005. a, Altitude—time intensity plot of 
backscattered power during the campaign. The curved black broken line 
shows the radar-determined tropopause. The vertical grey broken lines show 
the start- and end-times of the period covering seven ozonesonde ascents of 
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Three rapid ascents in tropopause height are labelled A, B and C, 
and a fourth, smaller one is labelled D. In each case there is evidence 
of ozone intrusion from the stratosphere. Case B is especially clear. 
In the other cases, the intrusion is less distinct, but increases are 
apparent in ground-level ozone density following the tropopause 
jumps. Jumps C and D occur close together, making separation of 
their effects harder, and the effect of event D seems delayed until after 
the campaign ended. Nevertheless, increases in low-altitude ozone 
densities are clearly associated with the tropopause jumps. In each 
case, decreased humidity served as an additional indicator that the air 
was stratospheric in origin. In case B—-B’, a noticeable increase in 
surface ozone was also observed at stations from Montreal back to 
the Great Lakes, 1,000 km to the west. Back-trajectory calculations 
and meteorological analyses indicated that a significant part of this 
ozone enhancement was stratospheric in origin. Ozone had entered 
the troposphere both above Montreal and as much as 1,000km 
upstream to the west and northwest on 3 May, and the whole ozone 
enhancement moved downward and downstream as a layer. An 
initially upstream component reached the ground in Montreal on 
7 May. Ground-level values at these stations before and after the two 
events of 30 April—2 May and 6—9 May to were typically 15 p.p.b. at 
night and 30 p.p.b. during the day, but during the event of 6-9 May 
were typically 30 p.p.b. (night-time) to 50 p.p.b. (daytime). Strato- 
spheric ozone had a large impact on the surface ozone densities, 
even in a large city in which photochemical effects might have been 
expected to dominate. 

Figure 2 shows an example from the Walsingham campaign of 
November 2005, which ran from 17 to 25 November (23 launches). 
The period of greatest interest is the time frame covering 23— 
25 November. An enhancement of ozone appeared at an altitude of 
about 5-6 km, briefly descended, and then rose again to about 8km 
in altitude (Fig. 2b). This layer had low water vapour content 
(<0.2 mbar), suggestive of stratospheric origin, but otherwise did 
not seem strongly associated with the stratosphere. 

However, the radar data did suggest a stratospheric link. The scat- 
tered power as a function of height and time is shown in Fig. 2a, with 
the radar-determined tropopause indicated. At about the same time 
that the ozone enhancement appeared, the radar-determined tropo- 
pause showed a rapid descent (point P in Fig. 2a). The tropopause 
temperature profiles determined from the radiosondes showed a 
correlated behaviour, with strong temperature inversions tracking 
the radar-determined tropopause. The trajectory of this enhance- 
ment is shown by the broken line. When the radar tropopause des- 
cended, it left a region of only weak scatter at the ‘normal’ tropopause 
height (marked Q on the figure). The tropopause jet stream showed a 
very wavelike structure, indicating strong nonlinear planetary wave 
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interest. b, Successive ozone profiles for the seven ozonesonde launches. The 
upper, red sections of the profiles indicate the stratosphere, based on the 
World Meteorological Organization (WMO) definition of the tropopause 
(which by definition generally occurs above 500 mb), and the centre, orange 
sections emphasize the region of ozone of interest. 
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Figure 3 | Three-dimensional image of the 100 p.p.b. ozone surface during 
the event of 23-25 November 2005, as determined by the FLEXPART 
model. The time was specifically 24 November 2005 at 15:00 ut. The location 
of the Walsingham radar and radiosonde launch site is shown (vertical 
yellow line). The white arrow shows north. 


activity in the lower stratosphere, as shown in maps provided by 
the Canadian Meteorological Centre (http://weatheroffice.gc.ca/ 
analysis/index_e.html). 

Figure 3 shows the results of a numerical simulation using 
FLEXPART. The model clearly indicates a deep influx of ozone from 
the stratosphere, resulting in a tongue of ozone penetrating to 4km 
altitude, implying significant downward motion. The tongue of 
stratospheric air swept over the radar in such a way that its tip passed 
over the radar, producing the variation shown in Fig. 2b. Without the 
use of radar data, the stratospheric origin of this ozone tongue may 
well have been missed. In contrast to the Montreal case, in which the 
ozone reached the ground, the ozone in Fig. 3 appeared to mix with 
the surrounding air at 3-4 km. 

Figure 4 shows the occurrence of all ozone intrusions during the 
four campaigns studied, shown placed end-to-end. Dark blue lines 
show occasions of strong intrusion, light blue represents weaker 
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intrusions, and black lines represent strong upward tropopause 
motions. Large positive vertical velocities can arise due to true tropo- 
pause jumps (as in Fig. 1b) and can also be produced by split tropo- 
pauses, as in Fig. 2. In the latter case, the tropopause-determination 
software follows the descending tropopause from point P until it 
becomes so low that it would not normally be considered a tropo- 
pause, and then finds the higher-level, weaker, tropopause (closer to 
point Q), producing an apparent jump in tropopause height. Both 
types of jumps will be considered collectively. 

Every occurrence of definite ozone intrusion is associated with a 
level 2 or level 3 radar-tropopause excursion rate at, or just before, the 
intrusion, with the exceptions of events A, B, Cand D. For cases C and 
D, the tropopause was only intermittently visible with the radar (this 
happens on occasion). Of the remaining 13 intrusions, 11 (that is, all 
except A and B), were associated with a level 2 or level 3 tropopause 
excursion. Even more telling, every level 2 or level 3 tropopause 
excursion was associated with some form of ozone intrusion. 
Hence, a level 2 or level 3 tropopause jump is a very strong predictor 
for ozone intrusion. No matter whether the jumps are real, or a 
consequence of a split tropopause or discontinuity, they serve as a 
valuable diagnostic. The ability to detect ozone intrusions in this way 
is an important capability, and a major result of our study. This is 
particularly true because windprofilers generally operate 24 hours 
per day, 365 days per year, so even when ozonesonde data are not 
available, windprofiler data can be used as a proxy for the possibility 
of ozone intrusions. This will be useful for air-quality forecasts, 
stratosphere-troposphere transport research, and general under- 
standing of the ozone circulation, transport, and budget. 


METHODS SUMMARY 


The simultaneous and co-located use of windprofiler radars combined with 
frequent ozonesonde launches represents the key experimental aspect of this 
project. Windprofilers”’** are radars that permit ground-based studies of the 
atmosphere from regions close to ground-level, to altitudes of 12 km and higher 
(depending on power output). A powerful transmitter initially emits repeated 
pulses of radio waves into the air, whereupon small portions of this signal are 
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Figure 4 | Occurrence of ozone intrusions compared to radar-determined 
tropopause excursions. Dates for the campaigns are given as month/day. 
Dark-blue vertical bands represent occasions where a tropospheric ozone 
maximum occurred between 3 km altitude and the tropopause, and where 
this maximum exceeded the background value (that is, average of values 
before, after, above and below the maximum) by at least 25 p.p.b. A 
coincident local minimum in water vapour content was also required. This 
combination is taken as a strong indicator that a stratospheric intrusion had 
occurred, particularly if the layer showed evidence of descent. Light-blue 
shading indicates a weaker but nevertheless real intrusion, where in this case 
the excess of the local peak exceeded the background by 15-25 p.p.b. Grey 


bands represent occasions when there was no significant tropospheric 
maximum, and white bands indicate times when no ozone data were 
available. The date of intrusion is set according to the time at which the 
intrusion left the stratosphere, and not the time of arrival at the ground. The 
lower panel shows the vertical velocity of the tropopause, as determined 
from the radar data. Tropopause excursion velocities have been classified 
into four categories using the tropopause gradient indicator; specifically: 
type 3 means >0.4kmh _ ', type 2 means >0.3 kmh _ |, type 1 means 
>0.2kmh ‘ and type 0 means <0.2 kmh ' These categories are plotted as 
black vertical lines in the upper graph. 
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returned to the radar antennas and recorded for analysis. Proper interpretation 
of these returned signals allows wind and turbulence strengths in the atmosphere 
to be measured, and in our case the height of the tropopause can also be found. 
The radars run continuously, allowing unprecedented monitoring of tropopause 
height. Ozone measurements were made using EN-SCI ozonesondes and ancil- 
lary equipment, including a ground station. The ozonesondes were accompanied 
by Vaisala RS-80 radiosondes for pressure, humidity and temperature measure- 
ments. On-board GPS receivers were used to track the the sonde positions and 
allow wind velocity determination. Typically either 800g or 1,200g balloons 
were used, filled with sufficient helium that they achieved an ascent rate of 
3-5ms |. Experimental studies were supported by application of the 
FLEXPART” computer model, which permitted modelling of ozone movement. 
FLEXPART required hourly wind fields produced by a regional analysis model 
called GEM”®, run at 0.1375° X 0.1375° resolution on a domain covering North 
America with 58 vertical levels to 10 hPa. Each FLEXPART run released 600,000 
particles in the model domain, with those in the stratosphere initialized using an 
empirical relationship between potential vorticity and ozone concentration”. 
These were then advected using wind fields from GEM, and the resulting ozone 
field was output at 1° X 1°X 500 m resolution. Chemistry is not included in the 
model. FLEXPART has been extensively validated’’*”*’. Please see the 
Supplementary Information for further details. 
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Haemodynamics determined by a genetic programme 
govern asymmetric development of the aortic arch 


Kenta Yashiro’*+, Hidetaka Shiratori’”? & Hiroshi Hamada’? 


Laterality of the internal organs of vertebrates is determined by 
asymmetric Nodal signalling in the lateral plate mesoderm’. A 
deficiency of such signalling results in heterotaxia syndrome, 
characterized by anomalous laterality of visceral organs and com- 
plex congenital heart conditions’. Pitx2, the transcription factor 
induced by the Nodal signal, regulates left-right asymmetric 
morphogenesis'*. The cellular and molecular bases of asymmetric 
morphogenesis remain largely unknown, however. Here we show 
that ablation of unilateral Pitx2 expression in mice impairs asym- 
metric remodelling of the branchial arch artery (BAA) system, 
resulting in randomized laterality of the aortic arch. Pitx2-positive 
cells were found not to contribute to asymmetrically remodelled 
arteries. Instead, Pitx2 functions in the secondary heart field’ and 
induces a dynamic morphological change in the outflow tract of 
the heart, which results in the provision of an asymmetric blood 
supply to the sixth BAA. This uneven distribution of blood flow 
results in differential signalling by both the platelet-derived 
growth factor receptor and vascular endothelial growth factor 
receptor 2. The consequent stabilization of the left sixth BAA 
and regression of its right counterpart underlie left-sided forma- 
tion of the aortic arch. Our results therefore indicate that haemo- 
dynamics, generated by a Pitx2-induced morphological change in 
the outflow tract, is responsible for the asymmetric remodelling of 
the great arteries. 

To understand how the Nodal-Pitx2 signalling pathway brings 
about asymmetric organ morphology, we studied the mechanism 
of situs-specific development of the aortic arch. Mutant mice 
(Pitx2* aS poner) lacking the asymmetric enhancer (ASE) of Pitx2 
develop right isomerism®, which manifests in part as severe congen- 
ital heart conditions’”’. The laterality of the aortic arch also seems to 
be randomized in these mice (Fig. 1a, Supplementary Fig. 1, and data 
not shown). The cardiovascular phenotype of Pitx2*°"/44S¥ mice 
seemed similar to that of a Pitx2c /~ mutant?, but re-evaluation of 
newborn Pitx244S*/44SF mice further revealed that laterality of the 
patent sixth BAA was randomized (whereas the fourth BAA remained 
normal) and was always concordant with that of the patent dorsal 
aorta (dAo). The correlation between the laterality of the sixth BAA 
and that of the dAo, together with the fact that the earliest event of 
asymmetric BAA remodelling is thought to be regression of the right 
sixth BAA*”, suggested that the situs of the aortic arch depends on 
which side the sixth BAA undergoes regression. 

The arterial system is initially formed symmetrically, with sub- 
sequent asymmetric remodelling giving rise to the aortic arch”. 
Thus, two regions, namely the right sixth BAA and the right 
dAo, undergo complete regression in the wild type (Fig. la). In 
Pitx244SP/4ASE embryos, the symmetrical BAA system seems to form 
normally (Supplementary Fig. 2a). As in the Pitx2c mutant mice’, the 
migration and function of cardiac neural crest cells also seems to be 


normal in Pitx244SP/44SF mice®!!?, given that they do not manifest 
persistent truncus arteriosus’’ (Supplementary Fig. 2b). These obser- 
vations suggested that the aortic arch anomaly in Pitx2““S¥/44S¥ mice 
is due to impaired BAA remodelling. 

To determine the role of Pitx2 in BAA remodelling, we next examined 
the contribution of Pitx2-positive cells to the developing BAA system. 
Unexpectedly, we did not detect any Pitx2-positive cells in or near the 
sixth BAA ordAo (Fig. 1b, d,and Supplementary Fig. 2c—e). As expected’’, 
Pitx2-positive cells were abundant in the secondary heart field, including 
the myocardium of the left wall of the outflow tract (OFT) (Fig. 1c). These 
results indicated that, although BAA remodelling requires Pitx2, it is not 
governed directly by Pitx2-positive cells, whereas morphogenesis of the 
OFT does involve such cells*””. 

To determine whether unknown morphological features of the 
OFT contribute to asymmetric remodelling of the sixth BAA’, we 
examined OFT morphology at embryonic day (E)11.5 and E12.0 
by optical projection tomography’. At E11.5, immediately before 
BAA remodelling, the OFT spiralled through 180° in wild-type 
embryos? (Fig. le). In Pitx2“4°F/44S® mice, however, the OFT had 
failed to adopt the spiral structure and remained linear. This finding 
suggested that Pitx2 is required for formation of the spiral structure 
of the OFT, which is probably responsible for the correct alignment 
of the ventricles with the great arteries. At E12.0, the spiral structure 
of the OFT undergoes a 90° rewinding as a result of a clockwise 
rotational movement of the arterial pole’* (Fig. 1f). This rotation 
shifts the entry point of the right sixth BAA towards the left, adjacent 
to the aorta (Fig. 1f). Simultaneously, the right sixth BAA undergoes a 
decrease in diameter and becomes longer than its left counterpart 
(Fig. 1f). In Pitx2*4S*/44S¥ embryos, however, this rotational move- 
ment fails to occur, and the sixth BAA remains bilaterally patent at 
E12.5 (Supplementary Fig. 3). These results suggested that the rota- 
tional movement of the arterial pole that takes place between E11.5 
and E12.0 may be responsible for the right-side-specific regression of 
the sixth BAA in wild-type mice. Neither cell death nor cell growth 
seems to be the initial cue for BAA remodelling (Supplementary 
Fig. 4a—o, Supplementary Fig. 5a). 

Rotation of the arterial pole renders the right sixth BAA longer 
and narrower than its left counterpart (Figs 1f and 2a), changes that 
would be expected to reduce the flow rate in the right vessel. Previous 
studies have suggested that haemodynamics may regulate morpho- 
genesis of the aortic arch’®’’. Furthermore, echocardiography of the 
uterus’® revealed that blood flow in the left dAo was similar to that in 
the right dAo at E11.5 but was significantly greater during arterial 
remodelling at E12.0 in wild-type embryos, whereas blood flow in 
the dAo was bilaterally equal in Pitx2*S*/44S" embryos at both 
developmental stages (Supplementary Fig. 6 and Supplementary 
Table 1). A decrease in or loss of blood flow in the right sixth BAA 
of wild-type embryos might therefore result in its regression. To test 


'Developmental Genetics Group, Graduate School of Frontier Biosciences, Osaka University, 1-3 Yamada-oka, Suita, Osaka 565-0871, Japan. @CREST, Japan Science and Technology 
Corporation (JST), 1-3 Yamada-oka, Suita, Osaka 565-0871, Japan. +Present address: Translational Cardiovascular Therapeutics, William Harvey Research Institute, Barts and The 
London, Queen Mary's School of Medicine and Dentistry, University of London, Charterhouse Square, London EC1M 6BQ, UK 
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this hypothesis, we ligated the left sixth BAA immediately before the 
onset of remodelling; this did not affect heart beating (Fig. 2b, c, 
Supplementary Table 2 and Supplementary Movies 1-4). Ligated 
embryos were then cultured in vitro for 36h before examination of 
arterial morphology (Fig. 2d—f). In control embryos, BAA remod- 
elling occurred normally. However, in about half of the embryos with 
the left sixth BAA ligated, the right sixth BAA persisted, whereas the 
ligated left artery underwent complete regression in all cases. In the 
embryos in which the right sixth BAA remained patent, the diameter 
of the dAo was larger on the right side than on the left. Ligation 
induced ectopic apoptosis in the left sixth BAA (Supplementary 
Fig. 4p—u). These results indicate that blood flow is essential and 
sufficient for the persistent patency of the sixth BAA and dAo. 
In support of this notion, a decrease in heart rate induced by the 
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Figure 1| Remodelling of the sixth BAA is governed in a non-cell- 
autonomous manner by left-side-specific Pitx2. a, Patterns of BAA 
remodelling in Pitx2**S"/“4S¥ embryos as represented by system 
morphology at E11.5. Red regions, complete regression; R, right; L, left; AS, 
aortic sac; 7is, seventh intersegmental artery; 3, 4 and 6 in wild-type (WT) 
indicate third, fourth and sixth BAA, respectively. b—d, Staining for Pitx2- 
positive cells around remodelled arteries (b, d) and outflow tract (¢€) in 
horizontal sections of E11.5 wild-type embryos. The boxed region in b is 
shown in d. The ventral side is at the top. Red and blue arrows, sixth BAA; red 
and blue arrowheads, dAo; open arrowhead, Pitx2-positive cells; LA, left 
atrium; LV, left ventricle; RA, right atrium; RV, right ventricle. Scale bars, 
250 [um (b, c) or 100 um (d). e, f, Arterial pole rotation in WT embryos 
between E11.5 (e) and E12.0 (f). Blue arrows, right ventricular outflow tract 
(RVOFT), red arrows, left ventricular outflow tract (LVOFT); PA, 
pulmonary artery; blue asterisk, pulmonary trunk; red asterisk, aortic trunk. 
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B-adrenergic antagonist propranolol resulted in bilateral sixth BAA 
regression in wild-type embryos (see below). 

We next examined whether growth factor signalling might change 
during regression of the sixth BAA, as occurs with the regression of 
tumour vessels'’. Among several signalling molecules examined, 
platelet-derived growth factor (PDGF)-A and vascular endothelial 
growth factor receptor 2 (VEGFR2) were found to be associated 
with remodelling of the sixth BAA (Fig. 3a-h). In wild-type 
embryos at E12.0, Pdgfa expression in the sixth BAA was maintained 
on the left side but decreased on the right (Fig. 3a—c). Furthermore, 
the level of phosphorylated VEGFR2 (pVEGFR2) in endothelial cells 
was decreased in the right sixth BAA near the OFT (Fig. 3g). In 
Pitx2SP/MASE embryos, however, Pdgfa (Fig. 3d-f) and pVEGFR2 
(Fig. 3h) remained symmetrical. Conversely, in four of six embryos in 


” 


Ligation of 
left 3rd, 4th and 6th BAAs 


Histological analysis <a Whole embryo culture 


Figure 2 | Blood flow is essential and sufficient for persistent patency of 
the sixth BAA. a, Proposed mechanism for regression of the right sixth BAA. 
Ao, aorta; PAt, pulmonary trunk; 6, sixth BAA. b, c, Experimental strategy to 
determine the effect of ligation of the sixth BAA on its right counterpart. HL, 
hindlimb bud; FL, forelimb bud; 3, 4 and 6 in the diagram indicate third, 
fourth and sixth BAA, respectively. d, Frequency of persistent patency of the 
right sixth BAA in embryos subjected, or not (control), to ligation of the left 
sixth at E11.0. The numbers of embryos showing each morphological 
pattern are indicated in the bar graph: blue, persistently patent; black, under 
regression; yellow, completely regressed. e, f, Haematoxylin/eosin staining 
of horizontal sections of cultured embryos either subjected to ligation of the 
left sixth BAA (f) or unligated (e). Ventral side is up. Black arrow, left sixth 
BAA; red arrow, right sixth BAA; asterisks, dAo. Scale bars, 250 um. 
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which the left sixth BAA had been ligated, the levels of Pdgfa and 
pVEGER2 were decreased on the left side and maintained on the right 
side (Fig. 3i, j). These observations suggested that a differential blood 
supply results in asymmetry both in Pdgfa expression and in the 
amount of activated VEGFR2 in the sixth BAA, which is consistent 
with the higher rate of cell proliferation apparent in the left sixth BAA 
than in its right counterpart at E12.0 (Supplementary Fig. 5b). 

To examine further the roles of PDGF and VEGF, we investigated 
the effects of chemical inhibitors of PDGF receptor (PDGFR) or 
VEGER signalling on the patency of the sixth BAA in wild-type 
embryos. Although administration of a VEGFR tyrosine kinase 
inhibitor or PDGFR inhibitor (AG1296) alone to E11.0 embryos in 
culture had only small effects, administration of both together 
induced bilateral loss of the sixth BAA in most cases (Fig. 4a). 
Similarly, bilateral regression of the sixth BAA was induced by 
treatment with AG1433, which inhibits both PDGFR and VEGFR2 
signalling at 10 WM. The effects of these inhibitors were specific to 
the sixth BAA, given that the gross appearance of the embryo, heart 
and third and fourth BAAs were not affected (Supplementary Fig. 7a, 
b, Supplementary Videos 5-9, Supplementary Table 2, and data not 
shown). Although the molecular basis of this specificity is unknown, 
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Figure 3 | Relation between blood supply and both the expression of Pdgfa 
and the level of activated VEGFR2 in the sixth BAA. a—f, Expression of Pdgfa 
in frontal sections of wild-type (a—c) or Pitcg0AsPiAAse (d-f) embryos at E12.0. 
Asterisks, sixth BAA. Scale bars, 250 um (a, d) or 100 tm (b, ¢, e, f). 

g, h, Immunofluorescence analysis of VEGFR2 and pVEGFR2 of the sixth BAA 
(asterisks) in frontal sections of wild-type (g) or Pitx24ASE/AASE (h) embryos at 
E12.0. VEGFR2 was symmetrical in both genotypes. Blue, nuclei. Scale bars, 
100 jum. i, j, pyYEGFR2 and Pdgfa in frontal sections of wild-type embryos 
subjected (j) or not (i) to ligation of the left sixth BAA. VEGFR2 was not 
affected (data not shown) (j). Asterisks, sixth BAA. Scale bars, 100 Lm. 
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the sixth BAA has been suggested to possess different characteristics 
from the fourth'*. Furthermore, our results are consistent with pre- 
vious observations that mice deficient in PDGF or VEGF signalling 
manifest normal regression of the sixth BAA and a low frequency of 
right-sided aortic arch'’”°. Loss of both PDGFR and VEGER signal- 
ling therefore seems to result in bilateral regression of the sixth BAA. 
Furthermore, expression of a Vegfa transgene, but not that of a Pdgfa 
transgene, in the smooth muscle cells of the great arteries prevented 
the right sixth BAA and dAo from undergoing complete regression 
(three of nine Vegfa transgene-positive embryos; Fig. 4b), although 
these vessels were hypoplastic or had collapsed, suggesting that the 
VEGF signal is sufficient to preserve the sixth BAA and dAo (Sup- 
plementary Discussion 1). 

On the basis of our results, we propose a model for situs-specific 
BAA remodelling (Fig. 4c). First, the genetic programme, including 
the expression of Pitx2, induces a dynamic morphological change in 
the OFT, which in turn generates a differential distribution of blood 
flow. Blood flow in the left sixth BAA is sufficient to stimulate 
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Figure 4 | Both PDGF and VEGF signals are required for persistent patency 
of the sixth BAA. a, Frequency of patency of the left sixth BAA in embryos 
cultured with or without inhibitors. DMSO, dimethylsulphoxide (vehicle); 
VEGFRI, VEGER tyrosine kinase inhibitor. Blue columns, completely 
regressed; black columns, under regression; yellow columns, persistently 
patent. b, X-Gal staining of horizontal sections of E13.5 embryos harbouring 
an SM22a-Vegfa-IRES-lacZ or SM22«-Pdgfa-IRES-lacZ transgene. Asterisk, 
dAo; red arrow, unregressed right dAo; blue arrow, unregressed right sixth 
BAA. Ventral side is up. Scale bar, 250 tum. ¢, Model for the conversion of 
genetic (leftness) information into asymmetric aortic arch morphogenesis. 
AS, aortic sac. 
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PDGER and VEGFR2 signalling and the consequent maintenance of 
arterial structure, whereas the decreased blood flow in the right sixth 
BAA results in its regression. This sequence of events ensures that the 
aortic arch is established on the left side by E12.5. 

Various mutant mice show a laterality defect of the aortic arch 
without a deficiency in the Nodal-Pitx2 signalling pathway*”’. 
However, all such mutants manifest a combination of hypoplasia 
or aplasia of BAAs, malfunction or impaired migration of cardiac 
neural crest cells, and an OFT anomaly. The causes of the aortic arch 
anomaly in these animals might therefore differ from those of the 
anomaly in the Pitx2“AS*/44S¥ mutant, although the anomalies may 
share a common molecular mechanism to some extent. 

The notion that blood flow is responsible for the laterality of the 
aortic arch is consistent with the aetiology of congenital heart dis- 
eases in humans (Supplementary Discussion 2). It remains unknown, 
however, how Pitx2 governs the formation of the spiral septum and 
the subsequent rotation of the arterial pole. Further studies are there- 
fore necessary to characterize the cellular and molecular mechanisms 
underlying the morphological changes of the OFT. 


METHODS SUMMARY 

Mice. The generation of Pitx mutant mice’, Pitx2 17-Cre transgenic 
mice’, Wntl-Cre transgenic mice'’ and CAG-CAT-lacZ reporter mice” was 
described previously. 

Histological analyses. In situ hybridization, immunostaining and staining of 
sections with X-Gal (5-bromo-4-chloro-3-indolyl-f-p-galactopyranoside) were 
performed as described previously**”’. 

Optical projection tomography (OPT). OPT experiments were performed as 
described previously'*”*. 

Whole-embryo culture. Whole-embryo culture was performed as described™. 

Cell lineage analysis. Pitx2 17-Cre, Wntl-Cre or CAG-CAT-lacZ transgenes 
were introduced into Pitx2“S¥/* heterozygotes. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. The SM22x-Vegfa-IRES-lacZ and SM22x-Pdgfa-IRES-lacZ transgenes 
were constructed from mouse Vegfa or Pdgfa complementary DNAs (provided 
by S. Nishikawa and P. Soriano, respectively), an IRES-lacZ cassette and a 
2.8-kilobase genomic DNA fragment including the promoter and enhancer of 
mouse SM22z required for expression in smooth muscle cells of great arteries”. 
The latter fragment was amplified by PCR with the primers 5’-GCCAGGACA- 
GATGCAGGTAGGAGACTTTGG-3' and 5’-AGGAGAGTAGCTTCGGTGT- 
CTGGGCTGGGG-3’. 

Histological analyses. The assay of bromodeoxyuridine incorporation was 
performed as described previously™; embryos were dissected and fixed 30 min 
after intraperitoneal injection of bromodeoxyuridine into pregnant mice. 
Immunostaining was performed with antibodies specific for cleaved caspase 3 
(dilution 1:100; Cell Signalling), for VEGFR2 (dilution 1:100; Cell Signalling), 
for platelet-endothelial cell adhesion molecule (PECAM; dilution 1:20; 
Upstate Biotechnology) or for phosphorylated VEGFR2 (dilution 1:25; Spring 
Biosciences), as described previously”. Antigen unmasking by autoclaving of 
sections for 10min in 10mM sodium citrate (pH 6.0) was performed before 
incubation with the antibodies against cleaved caspase 3 and VEGFR2, and 
Can Get Signal immunodetection solution B (Toyobo) was used for the detec- 
tion of pVEGFR2. Immune complexes were detected with Alexa568-conjugated 
or Alexa488-conjugated secondary antibodies against rabbit or mouse immuno- 
globulin G (dilution 1:400; Molecular Probes) or with a peroxidase-based 
Vectastain kit (Vector). Nuclei were stained either with 4’,6-diamidino-2- 
phenylindole (for immunofluorescence), nuclear fast red (for X-Gal staining 
and in situ hybridization), methyl green or haematoxylin (for staining with 
the peroxidase-based Vectastain kit). Images were obtained and analysed with 
a confocal microscope (LSM510 META; Carl Zeiss) and Axiophoto2 (Carl 
Zeiss). 

Cell lineage analysis. Pitx2 17-Cre, Wntl-Cre or CAG-CAT-lacZ transgenes 
were introduced into Pitx2*S/* heterozygotes. Analysis of the lineage distri- 
bution of neural crest cells or of Pitx2-positive cells of the left lateral plate 
mesoderm was performed as described previously’. Pitx2-positive cells were 
localized with a Cre transgene driven by the ASE of Pitx2 (Pitx2 17-Cre) and 
with a Cre-responsive lacZ transgene (CAG-CAT-lacZ)*; this approach would be 
expected to detect cells actively expressing Pitx2 as well as those that had pre- 
viously expressed Pitx2, given that the ASE is active from E8.0. 

Treatment with chemical inhibitors. AG1296, AG1433 and VEGER tyrosine 
kinase inhibitor were obtained from Calbiochem, and propranolol from 
Sigma. All were dissolved in dimethylsulphoxide (Sigma). Median inhibitory 


nature 


concentrations provided by the manufacturer were 1.0uM for AG1296 
(PDGFR), 5 1M (PDGFR) and 9.4uM (VEGFR2) for AG1433, and 0.1 uM for 
VEGEFER tyrosine kinase inhibitor (VEGFR2). ICR mouse embryos dissected at 
E11.0 were cultured with vehicle alone or with inhibitors for 36 h (unless indi- 
cated otherwise), when they had developed to a stage equivalent to E12.5. Only 
well-developed embryos with a normal heartbeat were subjected to further ana- 
lysis except propranolol. Embryos cultured in the presence of 25 1M propranolol 
were always deformed, which might have resulted from extremely reduced peri- 
pheral circulation. 

Ligation experiments. In the arterial ligation experiments, the third, fourth and 
sixth BAAs on the left side of E11.0 or E11.5 ICR embryos were carefully ligated 
with a suture needle (nylon monofilament 10-0, T4A10N10-25; Bear Medic) so 
as not to damage the great vessels and atrium. Control embryos were simply 
punctured near the left BAAs with the suture needle. For examination of arterial 
morphogenesis, E11.0 embryos were cultured for 36 h, when they had developed 
to the stage equivalent to E12.5. For analysis of Pdgfa expression and pVEGFR2 
distribution, E11.5 embryos were cultured for 4-8 h. For evaluation of apoptosis, 
E11.0 embryos were cultured for 12h. Only well-developed embryos with a 
normal heartbeat were subjected to further analysis. 

Optical projection tomography (OPT). A Bioptonics OPT scanner 3001 was 
used for scanning (Bioptonics Microscopy, MRCT). Data were analysed with 
Amira v.4.0 software (TGS)’*. Arterial morphology was revealed by whole- 
mount immunostaining with antibodies against PECAM, and heart morphology 
by autofluorescence of the embryo itself. 

Echocardiography. Echocardiography of fetuses was performed as described 
previously'® with the use of a Vevo-770 imaging system, a Vevo Integrated 
Rail System and a Vevo Mouse Handling Table (VisualSonics). The probe 
was RMV-704. The rectal temperature of the dam was monitored continuously 
and maintained at 37°C during measurements. Data obtained from embryos 
with bradycardia or tachycardia for the corresponding gestational age’® were 
discarded. 

Video recording. Embryos cultured for 12-14h were transferred to DMEM 
medium without phenol red (Sigma) at 37°C and were immediately subjected 
to recording with a MAS30-SR1M adaptor (MeCan Imaging) and a HandyCam 
DCR-SR300 (Sony). The resulting videos (MPEG2 format) were edited and 
changed to QuickTime movies with QuickTime Pro (Apple). 


27. Li, L., Miano, J. M., Mercer, B. & Olson, E. N. Expression of the SM22« promoter in 
transgenic mice provides evidence for distinct transcriptional regulatory 
programs in vascular and visceral smooth muscle cells. J. Cell Biol. 132, 849-859 
(1996). 
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An essential role for a CD36-related receptor in 


pheromone detection in Dr 


osophila 


Richard Benton't, Kirsten S. Vannice't+ & Leslie B. Vosshall’ 


The CD36 family of transmembrane receptors is present across 
metazoans and has been implicated biochemically in lipid binding 
and transport’. Several CD36 proteins function in the immune 
system as scavenger receptors for bacterial pathogens and seem 
to act as cofactors for Toll-like receptors by facilitating recog- 
nition of bacterially derived lipids**. Here we show that a 
Drosophila melanogaster CD36 homologue, Sensory neuron mem- 
brane protein (SNMP), is expressed in a population of olfactory 
sensory neurons (OSNs) implicated in pheromone detection. 
SNMP is essential for the electrophysiological responses of OSNs 
expressing the receptor OR67d to (Z)-11-octadecenyl acetate 
(cis-vaccenyl acetate, cVA), a volatile male-specific fatty-acid- 
derived pheromone that regulates sexual and social aggregation 
behaviours**. SNMP is also required for the activation of the moth 
pheromone receptor HR13 by its lipid-derived pheromone ligand 
(Z)-11-hexadecenal’, but is dispensable for the responses of the 
conventional odorant receptor OR22a to its short hydrocarbon 
fruit ester ligands. Finally, we show that SNMP is required for 
responses of OR67d to cVA when ectopically expressed in OSNs 
not normally activated by pheromones. Because mammalian 
CD36 binds fatty acids’, we suggest that SNMP acts in concert 
with odorant receptors to capture pheromone molecules on the 
surface of olfactory dendrites. Our work identifies an unantici- 
pated cofactor for odorant receptors that is likely to have a wide- 
spread role in insect pheromone detection. Moreover, these results 
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define a unifying model for CD36 function, coupling recognition 
of lipid-based extracellular ligands to signalling receptors in both 
pheromonal communication and pathogen recognition through 
the innate immune system. 

Insect odorant receptors represent a novel class of polytopic 
membrane proteins unrelated to vertebrate G-protein-coupled che- 
mosensory receptors''”’*. The functional insect odorant receptor is a 
heteromer of a ligand-binding subunit and the highly conserved 
OR83b co-receptor, which mediates transport to sensory cilia'’’*). 
Little is known about how this complex recognizes odours and 
evokes neuronal depolarization. To isolate novel components 
involved in insect olfactory detection, we used a bioinformatic 
approach to identify molecules that exhibit the same insect-specific 
orthology and olfactory-specific tissue expression as these receptors 
(Fig. 1). Two-thousand one-hundred and thirty-five Drosophila 
genes with insect-specific orthologues were identified by comparing 
the fruit fly (Drosophila melanogaster), mosquito (Anopheles gam- 
biae) and eight non-insect genomes using the OrthoMCL algorithm 
(Fig. 1a)'®. Broadly expressed genes were excluded by selecting only 
the 616 genes with fewer than two expressed sequence tags. We 
recovered all classes of known insect chemosensory genes, including 
odorant receptors, gustatory receptors, odorant and other chemo- 
sensory binding proteins, and putative odour-degrading enzymes 
(Fig. lb and Supplementary Table 1). The remaining genes were 
classified on the basis of predicted protein domains (Fig. 1b and 
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Figure 1| A comparative genomics screen for olfactory molecules 
identifies Drosophila SNMP, a CD36-related receptor. a, Summary of 
bioinformatic screen. EST, expressed sequence tag. b, Pie chart of putative 
functions of genes retrieved from the screen. ¢, RT-PCR of Snmp 
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homologues in Drosophila and Anopheles. Control RT-PCR products: Cam 
(Drosophila) and rps7 (Anopheles). d, Phylogenetic tree of insect SNMPs and 
related Drosophila and mammalian CD36 proteins. Values are uncorrected 
(‘p’) distance. 
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Supplementary Table 1) and included many implicated in immunity 
and defence. 

Three-hundred and thirty-nine uncharacterized genes were 
screened for selective expression in the antenna—the major olfactory 
organ of Drosophila—by reverse transcriptase-polymerase chain 
reaction (RT-PCR). Of these, we focus here on Snmp, an antennal- 
enriched gene related to the CD36 receptor family (Fig. lc). The 
Anopheles homologue of Snmp was also antennal-specific (Fig. 1c), 
consistent with the previously described olfactory-specific expression 
pattern of the silk moth (Antheraea polyphemus) homologue Snmp-1 
(ref. 17). SNMPs form an insect-specific sub-group of the CD36 
family (Fig. 1d), explaining how Drosophila Snmp emerged from 
our bioinformatic screen. 

In the antenna, Snmp was found prominently expressed in a lat- 
eral-distal population of OSNs that co-express Or83b (Fig. 2b)'"*-», 
in non-neuronal support cells that surround these OSNs, and in 
support cells elsewhere in the antenna and chemosensory organs 
on the proboscis (Fig. 2b, and data not shown). Genetic labelling 
of SNMP-expressing OSNs with mouse CD8 fused to green fluor- 
escent protein (CD8—GFP) revealed that these neurons target nine 
glomeruli in the antennal lobe (Fig. 2c)'*’’—DA1, VAl1d, VA1I/m, 
DL3, DA4m, DA41, DA2, DC3 and DC1—corresponding to those 
innervated by OSNs of the trichoid sensilla, which are involved in 
pheromone detection””’. 

Using a peptide antibody, SNMP was found concentrated in tri- 
choid sensory cilia, where it co-localized with OR83b (Fig. 2d), but 
only at very low levels in the cell bodies and axons (Fig. 2d, and data 
not shown), similar to moth SNMP-1 (ref. 17). We did not observe 
SNMP in non-trichoid OSNs, but it was expressed in support cells 
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Figure 2 | SNMP localizes to sensory cilia of pheromone-sensitive OSNs. 
a, Olfactory sensilla distribution on the third antennal segment. Tissue 
orientation is always dorsal up, lateral left. b, RNA in situ hybridization of 
Snmp and Or83b on a wild-type antennal section. c, Immunostaining of 
CD8-GFP (anti-GFP) and neuropil (nc82) of a whole-mount brain of an 
Snmp-promoter-VP22—GAL4/UAS-CD8-GFP animal. d, Immunostaining 
of SNMP and OR83b on a wild-type antennal section. e, Immunostaining of 
SNMP-GFP (anti-GFP) and OR83b on antennal sections of control 
heterozygous (left) and homozygous (right) Or83b-null mutant animals. A 
model of SNMP-GFP is at the top left. f, Intrinsic YFP fluorescence in 
antennal sections of animals expressing YFP(1)-OR83b, SNMP-YFP(2) or 
both fusion proteins, as indicated. Drosophila genotypes are in Methods. 
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throughout the antenna (Fig. 2d). All anti-SNMP immunoreactivity 
was abolished in an samp-null mutant (see below), confirming anti- 
body specificity. Although the localization of SNMP in OSN cilia was 
similar to that of odorant receptors, it did not depend on OR83b 
when we expressed a functional SNMP-GFP fusion protein in OSNs 
innervating basiconic sensilla (Fig. 2e, and Supplementary Fig. 1). 
Therefore, SNMP ciliary trafficking is independent of both specific 
ligand-binding odorant receptors and OR83b. We examined whether 
SNMP might still contact odorant receptors in trichoid cilia by using 
the fluorescent protein fragment complementation assay''. We gen- 
erated and functionally verified SNMP and OR83b bearing comple- 
mentary fragments of a yellow fluorescent protein (YFP) reporter 
(Fig. 2f and Supplementary Fig. 1). Reconstitution of the fluorescent 
YFP signal in sensory cilia was only observed when both fusion pro- 
teins were expressed (Fig. 2f). As the YFP fragments do not self- 
associate, this reconstitution could only result if SNMP and OR83b 
were brought into close proximity (<80 A), providing evidence that 
SNMP is closely apposed to, although not necessarily directly inter- 
acting with, odorant receptors in the sensory compartment. 

We generated null mutants in Sump by gene targeting” (Fig. 3a—d). 
snmp mutants are viable and fertile with no gross morphological or 
locomotor defects. We examined the function of SNMP in the sub- 
population of trichoid sensilla innervated by neurons expressing 
OR67d—the best-characterized Drosophila pheromone receptor that 
recognizes cVA**!*. In samp mutants, neither the expression of Or67d 
nor the ciliary localization of GFP—OR67d or OR83b was affected 
(Fig. 3c, e) and axonal projections of snmp mutant OR67d-expressing 
neurons to the antennal lobe were wild type (Fig. 3f). The expression 
of LUSH, an odorant-binding protein secreted by trichoid sensilla 
support cells into the lymph® was normal (Fig. 3g). Thus Sump is 
dispensable for the development of trichoid OSNs and support cells. 

We investigated whether the responses of OR67d neurons to 
cVA stimulation were altered in snmp mutants. The relatively low 
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Figure 3 | Genetic analysis of Snmp. a, Snmp genomic locus and gene- 
targeting strategy. b, PCR confirmation of homozygous snmp-null mutant 
(snmp’) using primer pairs indicated in a. c, RNA in situ hybridization of 
Snmp and Or67d in antennal sections of control heterozygous (snmp'/+) or 
homozygous snmp mutant (snmp'/snmp”) animals. d, Immunostaining of 
SNMP on antennal sections of control wild-type or homozygous snmp 
mutant (snmp'/snmp’) animals. e, Immunostaining of GFP-OR67d 
(anti-GFP) and OR83b on antennal sections of control heterozygous or 
homozygous snmp mutant animals. f, Immunostaining of CD8—GFP- 
labelled OR67d-expressing axon termini (anti-GFP) and neuropil (nc82) on 
whole-mount brains of control heterozygous and homozygous snmp mutant 
animals. Or67d-GAL4 labels two glomeruli'*’: DA1 (receives input from 
OR67d-expressing OSNs) and VA6 (black asterisk; receives input from 
OR82a-expressing OSNs), the latter originally interpreted by us as 
co-convergence’” but later shown to be due to ectopic expression of 
Or67d-GAL4 in OR82a neurons’. g, Immunostaining of LUSH on antennal 
sections of control wild-type or homozygous snmp mutant (snmp’/snmp”) 
animals. 
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spontaneous activity of the OR67d neuron was observable as a 
sparse distribution of action potentials of uniform amplitude 
(Fig. 4a). On stimulation with cVA, wild-type neurons responded 
with a robust train of action potentials in a dose-dependent manner 
(Fig. 4a, b). samp mutant neurons displayed no cVA-evoked electro- 
physiological responses at any concentration tested (Fig. 4a, b), but 
showed an increase in spontaneous activity (Fig. 4a). Both spontan- 
eous and stimulus-evoked responses were fully restored by expres- 
sion of the Snmp rescuing transgene in OR67d-expressing neurons 
(Fig. 4a, b, and Supplementary Fig. 2), but not by expression in 
support cells surrounding these neurons (Fig. 4a, b, and Supplemen- 
tary Fig. 2). Expression ofa distinct Drosophila CD36-related protein, 
NINAD, in OR67d-expressing neurons did not rescue electrophysio- 
logical defects of snmp mutants (data not shown). Thus, SNMP has 
an essential, cell-autonomous and specific function in OR67d- 
expressing neurons in mediating responses to cVA. 

cVA detection is also dependent on LUSH and the OR67d/OR83b 
heteromeric receptor complex®*”* (data not shown), suggesting that 
SNMP acts with these proteins in a signalling pathway. In contrast 
to snmp mutants, however, loss of lush, Or67d or Or83b severely 
decreased spontaneous activity of these neurons (Fig. 4c, d)°*. 
Double-mutant analysis of this spontaneous activity phenotype 
revealed that Snmp is epistatic to lush, because OR67d-expressing 
neurons retained high levels of spontaneous activity in animals lack- 
ing both SNMP and LUSH (Fig. 4c, d). In contrast, snmp Or83b 
double mutants were, like Or83b, electrically silent (Fig. 4c, d). 
Although the mechanism by which spontaneous activity is regulated 
in Drosophila OSNs is unknown, our genetic analysis indicates that 
SNMP may act downstream of LUSH and upstream of, or in parallel 
with, odorant receptors in the generation of action potentials. 
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Figure 4 | SNMP mediates electrophysiological responses to cVA. 

a, Representative traces of extracellular recordings of OR67d neurons 
stimulated with 10% cVA in wild-type, samp mutant, neuronal rescue, and 
support-cell rescue animals. Bar above traces marks stimulus time (1s). 
Traces are from female flies, but no significant sex-specific cVA responses are 
observed. b, Dose-response curves for cVA in the genotypes in a. Mean 
responses are plotted (+s.e.m.; n = 39-47 sensilla; <4 sensilla per animal, 
mixed genders). snmp mutant and support-cell Sump rescue are highly 
significantly different from wild-type and neuronal rescue animals at all 
concentrations of cVA (ANOVA; P < 0.0001). c, Representative traces of 
spontaneous activity in OR67d neurons in wild-type, snmp mutant, lush 
mutant, snmp lush double-mutant, Or83b mutant and snmp Or83b double 
mutant animals. d, Quantification of mean spontaneous activity in the 
genotypes in c (+s.e.m.; n = 16-20, male flies). Bars labelled with different 
letters are highly significantly different (ANOVA; P< 0.0001). 


LETTERS 


To investigate the specificity of SNMP function, we ectopically 
expressed in OR67d neurons a second receptor, OR22a, which is 
responsive to fruit esters, such as ethyl butyrate and pentyl acetate”*. 
Although chemically related to cVA, OR22a ligands lack the long 
hydrophobic tail of this fatty-acid-derived pheromone (Fig. 5a). 
Ectopic expression of OR22a in wild-type OR67d-expressing 
neurons conferred responses to a panel of known OR22a ligands in 
addition to the endogenous cVA response (Fig. 5a, b), but not to a 
control odour, geranyl acetate, which activates neither OR67d nor 
OR22a (ref. 25). In samp mutants, ectopic OR22a-dependent res- 
ponses were unaffected, but all cVA responses were lost (Fig. 5a, b). 
The broad expression of SNMP in trichoid OSNs indicates that 
it might have a general function in pheromone detection. Because 
no other volatile pheromones have been identified in Drosophila, 
we tested whether SNMP is required for the activation of the 
moth (Heliothis virescens) pheromone receptor HR13 by (Z)-11- 
hexadecenal’, a component of the sex pheromone blend of this 
species. As previously observed, expression of HR13 in OR67d- 
expressing neurons conferred responsiveness to this pheromone® 
(Fig. 5c, d). This response was almost completely abolished in snmp 
mutants and restored by transgenic rescue of Snmp (Fig. 5c, d). 
Together, these experiments reveal a specific and conserved function 
for SNMP in mediating pheromone-evoked neuronal activity. 
OR67d and HR13 share <15% amino acid identity and their ligands 
have chemically distinct head groups, suggesting that it is the fatty- 
acid-derived hydrocarbon tail common to these pheromones that 
necessitates SNMP. 

Finally, we asked whether SNMP is required for the activation of 
OR67d by cVA in neurons not normally responsive to pheromones. 
We ectopically expressed OR67d in basiconic OSNs that lack the 
endogenous OR22a ligand-binding odorant receptor, but retain 
OR83b (ref. 24). All action potentials in these neurons can therefore 
be ascribed to OR67d/OR83b activity. Or22a mutant neurons 
expressing OR67d without SNMP exhibited spontaneous firing, 
but did not respond to cVA (Fig. 5e, f). In contrast, when OR67d 
was co-expressed with SNMP, significant responses to this phero- 
mone were observed (Fig. 5e, f); compared to the responses of native 
OR67d neurons, the frequency of action potentials was lower and 
exhibited slower rise and decay rates (Fig. 5e, f). Such differences 
may be due to the absence in basiconic sensilla of LUSH or odour- 
degrading enzymes specialized to inactivate pheromone molecules”. 

Through a bioinformatic screen for insect olfactory transduction 
molecules, we have identified Drosophila SNMP as a CD36-related 
receptor broadly expressed in pheromone-sensing neurons, which is 
an essential co-factor for detection of the fatty-acid-derived phero- 
mone cVA. As mammalian CD36 has an important biochemical 
function in binding and membrane translocation of fatty acids we 
suggest SNMP directly captures pheromone molecules on the surface 
of OSN cilia—possibly retrieving them from odorant-binding pro- 
teins in the extracellular milieu—and facilitates their transfer to the 
odorant-receptor-OR83b complex (Fig. 5g). A recent study showed 
OR67d ectopically expressed without SNMP could be activated by 
cVA when the pheromone was directly applied to the sensillar cuticle 
overlying the OSN”’, indicating that pheromone receptors can be 
directly stimulated by ligand. When pheromones are presented in 
an air stream to the receptor in its native environment, however, 
SNMP (and odorant-binding proteins*) are essential. We suggest 
that the combination of molecular specializations of pheromone- 
sensing trichoid neurons together contribute to the sensitivity of 
these cells and that SNMP-related proteins function in the detection 
of many insect pheromones. 

The mechanistic basis of CD36 ligand interactions and signalling is 
still poorly understood in any biological system. Our results have 
three important general implications. First, we show that SNMP 
has a specific role in the detection of fatty-acid-derived odour 
ligands. Because other CD36-related receptors are involved in bind- 
ing and transport of lipid-based molecules, for example in the 
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Figure 5 | SNMP is specifically required, and sufficient, for pheromone 
detection. a, Representative traces from OR67d neurons stimulated with 
10% cVA or 10% ethyl butyrate in wild-type, wild-type + ectopic OR22a, or 
snmp-null mutant + ectopic OR22a animals. Bars above traces mark stimulus 
time (1s). Structures of compounds are depicted at the top. b, Quantification 
of responses to cVA and OR22a ligands in the genotypes in a. Mean responses 
are plotted (+s.e.m.; n = 16, female flies). Responses of wild-type and snmp- 
mutant OR67d-expressing neurons ectopically expressing OR22a are highly 
significantly different for cVA (ANOVA; P < 0.0001) but not to any OR22a 
ligand (ANOVA; P > 0.4149). Geranyl acetate is a control odour that does not 
stimulate OR67d or OR22a. ¢, Representative traces from OR67d neurons 
stimulated with 10% cVA or 100% (Z)-11-hexadecenal in wild-type, wild- 
type + ectopic HR13, snmp-null mutant + ectopic HR13, or snmp-null 
mutant + ectopic HR13 + Snmp rescue animals. Bars above traces mark 
stimulus time (1s). Structures of pheromones are depicted at the top. 

d, Quantification of responses to cVA and (Z)-11-hexadecenal in the 
genotypes in c. Mean responses are plotted (+s.e.m.; n = 17-19, female flies). 
Responses to (Z)-11-hexadecenal of wild-type and snmp-mutant OR67d 
neurons ectopically expressing HR13 are highly significantly different 
(ANOVA; P < 0.0001). HR13-dependent responses to (Z)-11-hexadecenal of 
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Snmp rescue animals are highly significantly different from both wild-type 
and snmp mutant responses (ANOVA; P < 0.0001), indicating partial rescue. 
e, Representative traces from Or22a mutant neurons expressing OR67d alone 
or OR67d and SNMP, stimulated with control solvent paraffin oil or 100% 
cVA. Bars above traces mark stimulus time (1s). OR22a neurons reside in 
basiconic sensilla with two neurons, visible as two distinct amplitudes 
(labelled A and B) of action potentials: responses of OR67d in Or22a mutant 
neurons are represented by the larger amplitude. The neuron with smaller 
amplitude action potentials responds to both control and cVA stimuli. 

f, Peristimulus time histograms for the genotype and stimulus combinations 
in e, using the same colour scheme. Endogenous OR67d neuron response to 
100% cVA is in grey. Mean responses are plotted (+s.e.m.; n = 9-12, mixed 
genders). The activity of Or22a mutant neurons expressing both OR67d and 
SNMP and stimulated with cVA is highly significantly different between time- 
points 3-9 s to Or22a mutant neurons expressing OR67d alone and/or when 
stimulated with paraffin oil (ANOVA; P < 0.0001). g, Model of mechanistic 
parallels between insect pheromone detection and bacterial pathogen 
detection. Extracellular LBPs (liposaccharide-binding proteins) may be 
functionally analogous to odorant-binding proteins (OBPs). ORX is any of 
the ligand-binding Drosophila pheromone receptors. 
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mammalian intestine’, this protein family may represent specialized 
receptors for extracellular fatty ligands of diverse biological origin 
and function. Second, we show that SNMP acts in concert with other 
transmembrane odorant receptors in OSN cilia in mediating phero- 
mone-evoked activity. Because CD36 was previously shown to act as 
a co-receptor for Toll-like receptors’, we suggest that CD36-related 
proteins have obligate transmembrane partners in all their cellular 
roles. 

Finally, our results reveal a molecular parallel in the mechanisms of 
intraspecific recognition through pheromone detection and patho- 
gen recognition through the innate immune system (Fig. 5g). CD36 
proteins in both invertebrates and vertebrates have been implicated 
in the recognition of specific lipid-derived products from bacterial 
cell walls, and coupling of this recognition through Toll-like recep- 
tors to initiate the innate immune response’ *. Notably, mammalian 
CD36 has been proposed as a candidate fat taste receptor’. Common 
molecular recognition mechanisms in immune and chemosensory 
systems may therefore be widespread. 


METHODS SUMMARY 

Bioinformatics. Insect-specific orthologues were identified using the OrthoMCL 
server (http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi)'*. Gene tar- 
geting of Sump was performed essentially as described'*. Two null mutants, snmp! 
and snmp”, arising from different starting insertions of the targeting construct 
were analysed. 

Histology and immunocytochemistry. Two-colour in situ RNA hybridization” 
and immunofluorescence on antennal sections or whole-mount brains'!’ were 
performed as described. A rabbit polyclonal antibody against SNMP was raised 
against the synthetic peptide TNPATNPATHHKMEHRERY and affinity- 
purified by Proteintech Group. 

Electrophysiology and odorants. Extracellular recordings in single sensilla of 
2-8-day-old flies were performed essentially as described'’. High-purity odor- 
ants were obtained from Sigma-Aldrich, except cVA (purity ~99%) obtained 
from Pherobank. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Bioinformatics. Insect-specific orthologues were identified using the OrthoMCL 
DB server (http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi)'®, by 
comparing the predicted complete proteomes (in April 2004) of Drosophila mel- 
anogaster, Anopheles gambiae, Homo sapiens, Mus musculus, Caenorhabditis ele- 
gans, Saccharomyces cerevisiae, Arabidopsis thaliana, Escherichia coli, Plasmodium 
falciparum and Plasmodium yoelii. A supplementary data set of insect-specific 
orthologues was obtained from the Anopheles gambiae genome-sequencing pro- 
ject’®. Expressed sequence tag numbers for the corresponding Drosophila genes 
were downloaded from Flybase (http://flybase.bio.indiana.edu/) in June 2004. 
Protein sequences were manually curated using BLAST (http://www.ncbi.nlm. 
nih.gov/BLAST/) and SMART (http://smart.embl-heidelberg.de) servers. Snmp 
was previously curated by Flybase/NCBI as CG7000-RA under accession number 
NM_142696. The best-fit phylogenetic tree in Fig. 1d was generated using the 
neighbour-joining algorithm in MacVector v9.0 with default parameters. 
Molecular biology. Complementary DNA was synthesized from insect tissues 
using the Absolutely RNA Microprep Kit (Stratagene) and Superscript First- 
Strand Synthesis System (Invitrogen). Gene-specific primers for RT-PCR were 
designed using Primer3 (ref. 29) to amplify ~500bp spanning at least one 
intron. All plasmid constructs were generated by amplification of the desired 
cDNA or genomic fragments with primers containing flanking restriction sites 
using the Expand High Fidelity PLUS PCR system (Roche) and Oregon-R anten- 
nal or appendage cDNA or genomic DNA as templates. PCR products were T:A 
cloned into pGEM-T Easy (Promega), sequenced and subcloned into appropri- 
ate vectors as detailed below: Snmp-promoter- VP22-GAL4: the Snmp 5.412 kb 
promoter region (nucleotides 16,998,115—16,992,704 in GenBank accession 
AE014297) was subcloned into pVP22-GAL4 (ref. 30). lush-promoter-GAL4: 
the lush 0.959kb promoter region (nucleotides 19,599,310-19,598,352 in 
GenBank accession AE014296) was subcloned into pCaSpeR-AUG-GAL4 
(ref. 30). UAS-Snmp: full-length Snmp open reading frame (GenBank accession 
NM_142696) was subcloned into pUAST (ref. 31). UAS-Snmp-GFP: full-length 
Snmp open reading frame without termination codon was subcloned 5’ of EGFP 
(Clontech) in pUAST. UAS-Snmp-YFP(2): full-length Snmp open reading frame 
without termination codon was subcloned upstream of DNA encoding a 10 ami- 
no acid linker [(GGGGS).] and a carboxy-terminal YFP fragment of YFP(2) in 
pUAST (ref. 11). UAS-Or67d: full-length open reading frame of Or67d was 
subcloned in pUAST. UAS-GFP-Or67d: full-length open reading frame of 
Or67d was subcloned 3' of EGFP (without termination codon) in pUAST. 
Snmp targeting construct: 5’ and 3’ homologous arms (16,998,850-16,993,851 
and 16,991,435—-16,986,436 of GenBank accession AE014297, respectively) were 
subcloned to flank the white reporter gene in CMC105 (ref. 13). The gene 
structure in Fig. 3a was generated using Genepalette v1.2 (ref. 32). 

Insect strains. Drosophila stocks were maintained on conventional cornmeal- 
agar-molasses medium under a 12h light:12h dark cycle at 25°C. Wild-type 
Berlin (M. Heisenberg) was used for electrophysiological experiments and the 
wild-type Oregon-R strain was used for histology. Mutant alleles and transgenic 
lines used: Or83b', Or83b? (ref. 13), lush! (ref. 33), Or22a/b4"*" (ref. 24), Or67d- 
GAL4 (ref. 19), Or83b-GAL4 (ref. 34), Or22a-GAL4 (ref. 19), UAS-YFP(1)— 
Or83b (ref. 11), UAS-CD8-GFP (ref. 35), UAS-Or22a (ref. 13), UAS-HR13 
(ref. 8), UAS-ninaD (ref. 36), 7OFLP,70I-Scel/Cyo and 70FLP (ref. 37). 

Specific genotypes of flies in the figures are listed below. Figure 2e: Or83b-GAL4/ 
UAS-Snmp-GEP;Or83b'/+ (left) Or83b-GAL4/UAS-Snmp-GFP;Or83b'/Or83l" 
(right). Figure 2f: Or67d-GAL4,UAS-YFP(1)—Or83b/+ (left), Or67d-GAL4/UAS- 
Snmp-YFP(2);snmp'/snmp* (centre), Or67d-GAL4, UAS- YFP(1)—Or83b/UAS- 
Snmp-YFP(2);snmp',Ors3b’/snmp’,Ors3b' (right). Figure 3e: Or67d-GAL4/ 
UAS-GFP-Or67d;snmp'/+ (top) Or67d-GAL4/UAS-GFP-Or67d;snmp'/snmp” 
(bottom). Figure 3f: Or67d-GAL4/UAS-CD8-GFP;snmp'/+ (top) Or67d-GAL4/ 
UAS-CD8-GFP;snmp'/snmp’ (bottom). Figure 4a: Or67d-GAL4/+; snmp'/snmp* 
(second trace) Or67d-GAL4/UAS-Snmp;snmp'/snmp” (third trace) lush-GAL4/ 
UAS-Snmp;snmp'/snmp” (bottom trace). Figure 4c: snmp'/snmp” (second trace) 
lush'/lush' (third trace) lush',snmp'/lush',snmp* (fourth trace) Or83b'/Ors3l" 
(fifth trace) snmp’, Or83b’/snmp’,Ors3b' (bottom trace). Figure 5a: Or67d- 
GAL4/UAS-Or22a (middle) Or67d-GAL4/UAS-Or22a;snmp'/snmp” (bottom). 
Figure 5c: Or67d-GAL4/UAS-HR13 (second trace) Or67d-GAL4/UAS-HR13; 
snmp'/snmp? (third trace) Or67d-GAL4,UAS-Snmp/UAS-HR13;snmp'/snmp” 
(bottom trace). Figure 5e: Or22a/y4"*"/Or22a/04""”0122a-GAL4, UAS-Or67d/ 
+ (top two traces) Or22a/b1"*""/Or22a/4""”;0r22a-GAL4, UAS-Or67d/UAS- 
Snmp (bottom two traces). 

Adult mosquitoes (Anopheles gambiae G3 strain; MRA-112) were obtained 
from MR4 (www.mr4.org) through the Centers for Disease Control and 
Prevention. 

Gene targeting screen. Gene targeting of Sump was performed essentially as 
described'***, using five independent insertions of the targeting construct. 
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From approximately 200,000 F, progeny, at least 6 null mutants were obtained, 
which were confirmed by PCR on genomic DNA preparations from homozygous 
mutant animals amplifying fragments corresponding to 16,990,010-16,990,525 
(5') 16,992,709-16,993,129 (Sump) and 16,994,279-16,994,757 (3') in GenBank 
accession AE014297. Two of these, snmp! and snmp”, arising from different start- 
ing insertions of the targeting construct, were retained for phenotypic analysis. 
Histology and immunocytochemistry. Two-colour in situ RNA hybridization 
was performed essentially as described’? using Or83b-FITC, Or67d-DIG, and 
Snmp-DIG or -FITC RNA probes. Immunofluorescence on antennal sections 
or whole-mount brains was performed as described''’’. Primary antibodies: 
rabbit anti-OR83b EC2, 1:5,000 (ref. 13), rabbit anti-LUSH, 1:1,000 (ref. 33), 
mouse monoclonal nc82, 1:10 (R. Stocker), rabbit anti-GFP, 1:1,000 (Molecular 
Probes), mouse anti-GFP 1:500 (Molecular Probes). A rabbit polyclonal 
antibody against SNMP was raised against the synthetic peptide 
TNPATNPATHHKMEHRERY (corresponding to the C-terminal 19 amino 
acids), affinity-purified by Proteintech Group and used at 1:1,000. Secondary 
antibodies: Alexa488- and Cy3-conjugated anti-mouse IgG or anti-rabbit IgG 
1:100 or 1:1,000 for whole-mount brains and antennal sections, respectively 
(Molecular Probes; Jackson Immunoresearch). All microscopy was performed 
using a Zeiss LSM 510 Laser Scanning Confocal Microscope. For the Protein 
Fragment Complementation assay, the intrinsic fluorescence signal of reconsti- 
tuted YFP was detected in fixed samples by excitation with an Argon Laser 
(excitation wavelength 488 nm) and collection of the emitted light with Band 
Pass filter 505-530. 

Electrophysiology and odorants. Extracellular recordings in single sensilla of 
2-8-day-old flies were performed essentially as described'****°. Ten microlitres 
of odorant was added to a 6 mm filter paper disk (Whatman), which was placed 
inside a 1 ml tuberculin syringe (Becton, Dickinson and Company). A charcoal- 
filtered airflow (35 mls_') was used to deliver odours to the preparation through 
a 10 ml serological pipette that was trimmed to remove the tapered tip, and the 
cut end positioned 15mm away from the preparation. Half this airflow was 
diverted through the odour syringe during odour stimulation periods (1s) 
under the control of the Syntech CS-55 Stimulus controller. cVA (purity 
~99%) was obtained from Pherobank. Other odorants were obtained from 
Sigma-Aldrich at high purity. Chemical Abstracts Service (CAS) numbers: ethyl 
butyrate (105-54-4), methyl butyrate (623-42-7), pentyl acetate (628-63-7), 
methyl hexanoate (106-70-7), ethyl hexanoate (123-66-0), methyl octanoate 
(111-11-5), geranyl acetate (105-87-3), (Z)-11-hexadecenal (53939-28-9). 
Odorants were diluted to 10% in paraffin oil, except cVA, which was used at a 
range of dilutions (as indicated in the figures), methyl hexanoate and ethyl 
hexanoate, which were used at 1%, and (Z)-11-hexadecenal, which was used 
at 100%. Trichoid sensilla innervated by OR67d neurons are proximally distrib- 
uted on the antenna and can be unambiguously identified by extracellular elec- 
trophysiological recordings of individual sensilla because they are unique in 
housing only a single OSN. We found that the onset of cVA responses varied 
slightly (usually <200 ms) between animals of the same genotype recorded on 
different days, most probably owing to small variations in the position of the 
odour delivery apparatus relative to the preparation. For quantification of res- 
ponses, we therefore determined the time of onset of the response of a control 
wild-type sensillum to 100% cVA for each recording session. Corrected res- 
ponses for all recordings in the same session were quantified by counting spikes 
in a 0.5s window from this time point, subtracting the number of spontaneous 
spikes in a 0.5s window before stimulation, and doubling the result to obtain 
spikes s_'. Spontaneous activity was quantified by counting the spikes in a 5s 
window without stimulus, and dividing by 5 to obtain spikes s'. Peristimulus 
time histograms (PSTHs) were generated by counting the numbers of spikes in 
0.5s bins from 2s before 7 s after odour stimulation for each trial, using custom 
software written by M. Ditzen in IDL. These values were then averaged across all 
trials. After verifying that responses were normally distributed, we compared all 
genotypes for a given experiment by ANOVA, with genotype as the main effect, 
and adjusted the alpha level for planned post-hoc means comparisons. 
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Drosophila hygrosensation requires the TRP channels 


water witch and nanchung 


Lei Liu’, Yuhong Li’, Runping Wang’”, Chong Yin’, Qian Dong’”, Huey Hing’, Changsoo Kim* & Michael J. Welsh’? 


The ability to detect variations in humidity is critical for many 
animals. Birds, reptiles and insects all show preferences for spe- 
cific humidities that influence their mating, reproduction and 
geographic distribution’”. Because of their large surface area to 
volume ratio, insects are particularly sensitive to humidity, and its 
detection can influence their survival*’. Two types of hygrorecep- 
tors exist in insects: one responds to an increase (moist receptor) 
and the other to a reduction (dry receptor) in humidity*®*. 
Although previous data indicated that mechanosensation might 
contribute to hygrosensation®’, the cellular basis of hygrosen- 
sation and the genes involved in detecting humidity remain 
unknown. To understand better the molecular bases of humidity 
sensing, we investigated several genes encoding channels assoc- 
iated with mechanosensation, thermosensing or water transport. 
Here we identify two Drosophila melanogaster transient receptor 
potential channels needed for sensing humidity: CG31284, named 
by us water witch (wtrw), which is required to detect moist air, and 
nanchung (nan), which is involved in detecting dry air. Neurons 
associated with specialized sensory hairs in the third segment of 
the antenna express these channels, and neurons expressing wtrw 
and nan project to central nervous system regions associated with 
mechanosensation. Construction of the hygrosensing system with 
opposing receptors may allow an organism to very sensitively 
detect changes in environmental humidity. 

Previous work indicated that hygrosensing might involve mechano- 
sensation. For example, mechanical deformation activated moist 
receptors and inhibited dry receptors in cricket and honeybee”®. In 
addition, hygroreceptor cells in the capitulum sensilla in cockroach 
responded to physical stresses’. Therefore, we screened genes from 
the transient receptor potential (TRP) and degenerin/epithelial Na~ 
channel (DEG/ENaC) families because members of both are involved 
in mechanosensation in species ranging from nematodes to mam- 
mals’®''. We also tested several aquaporin genes because we thought 
their involvement in water transport might extend to moisture sens- 
ing. The D. melanogaster genome encodes 13 TRP, 24 DEG/ENaC 
and seven aquaporin genes'”"*. 

We tested hygrosensing behaviour by placing flies between two 
tubes: air at ~100% humidity flowed into one and air at ~0% 
humidity flowed into the other’. After five minutes, between 12% 
and 30% of wild-type flies distributed to the humid tube (Fig. 1a). We 
investigated eight TRP genes using available deficiencies and mutant 
lines (see footnote 1 in Supplementary Information)'*’*. Although 
deficiency lines only eliminate one of each TRP allele and reduce 
expression of many other genes in the deleted region, they are a useful 
tool for genetic screens'’. Deficiencies covering two TRP genes, 
nanchung (nan)? and CG31284, which we named ‘water witch 
(wtrw; Supplementary Fig. 1) because of its role in detecting mois- 
ture, disrupted hygrosensing behaviour (Fig. la). The other TRP 


genes, 11 DEG/ENaC, and five aquaporin genes studied either did 
not show a haploinsufficient phenotype and/or were shown by other 
studies to have no affect on hygrosensing behaviour (Supplementary 
Figs 2, 3, and Supplementary Information footnote 1). 

To evaluate further wtrw and nan, we asked if they are expressed in 
structures that sense humidity. Although Drosophila hygrosensing 
involves the distal antenna, earlier work disagreed about the location 
of hygrosensing neurons; hygrosensing has been attributed to neu- 
rons innervating the arista' and to 3rd antennal segment neurons of 
coeloconic sensilla’®. Consistent with a previous report’, ablating 
either the 3rd segment and arista (bar B in Fig. 1a) or the arista alone 
(bar C) eliminated hygrosensing behaviour. Because removing the 
arista might damage the antenna, we also made a small cut in the 3rd 
segment (bar D)—an intervention that also disrupted the normal 
humidity preference. These data implicate function of the 3rd anten- 
nal segment, and potentially the arista in hygrosensing. We also used 
the GH86 promoter to express tetanus neurotoxin light chain 
(Clostridial TNT in the UAS-TNT-H transgenic line). The GH86 
promoter drives expression in many 3rd but not 2nd antennal seg- 
ment cells (Supplementary Fig. 2e), and TNT-H inactivates chemical 
synapses, but can also cause some general cellular toxicity’’. Crossing 
GH86-GAL4 and UAS-TNT-H lines disrupted moisture sensing 
(Fig. la). In contrast, when the promoter for nompC, encoding a 
mechanosensitive channel expressed in the 2nd antennal segment 
(Supplementary Fig. 2f), drove TNT-H, there was no effect. These 
and electrophysiological results (presented below) indicate that the 
3rd antennal segment is the site of moisture sensing. 

Figure 1b shows the Drosophila antenna. The 3rd segment contains 
three main morphologic types of multipored sensilla (basiconic, coelo- 
conic and trichoid) with most containing two to four neurons!*”°, 
The wtrw and nan promoters each drove reporter gene expression in 
~27-37 cells of the 3rd antennal segment (Fig. 1c). These promoters 
were also expressed in neurons in the 2nd antennal segment (data not 
shown). We obtained similar results using in situ hybridization 
(Supplementary Fig. 4a). The nan expression pattern was similar to 
the general arrangement of basiconic sensilla, although not to the 
distribution of specific subsets of large, thin or small basiconic 
sensilla’**°. The distribution of wtrw-expressing neurons most 
resembled that of coeloconic sensilla'*. nan-expressing neurons had 
a single long dendrite that seemed to extend into basiconic sensilla 
(Fig. 1d). In contrast, wtrw-expressing neurons had short dendrites 
(Fig. le); although we were not able to identify the associated sensilla, 
in some cases they seemed to sit near the base of small, thin sensilla 
that might represent coeloconic sensilla that have been associated 
with hygroreception'® (Supplementary Information footnote 2). 
Both wtrw- and nan-expressing neurons were also labelled by a 
neuron-specific antibody (22C10) (Fig. 1f g), and wtrw-GAL4/ 
UAS-GFP (green fluorescent protein) also showed some expression 
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Figure 1| Behavioural preference and expression of wtrw and nan. 

a, Behavioural preference of flies to moist or dry air and expression of nan and 
wtrw. White bars represent an example of a control for a specific experiment, 
and black bars were tests. Additional controls and tested lines are shown in 
Supplementary Figs 2 and 3. Controls: Canton S (CS) and elav-GAL4 lines 
(N = 6 and 9, respectively; for behavioural tests, N indicates the number of 
trials). TRP family deficiency lines covering two TRP genes are shown: nan 
(Bloomington stock no. 3124, Df(3L)fz-GF3b, p{wA®}66E/TM6B, Tb’ ca’ and 
no. 3125, Df(3L)fz-GS 1a, p{wA®}66E/TM3,Sb’) and wtrw (Bloomington 
stock no. 1883, Df(3R)dsx5,Ubx"*! sr'e/TM1 and no. 1898, Df(3R)D7,Ubx' 
e*/TM1) (N= 17 and 19, respectively). Organ ablation involved no 
intervention (A), removing the entire 3rd segment and arista (B), removing 
the arista (C), or making a small cut in the 3rd segment (D), as indicated by 
the red lines (N = 7, 6, 6 and 6, respectively). Genetic disruption was done by 
crossing either a nompC-GAL4 line, which drives expression in the 2nd 
antennal segment, or a GH86-GAL4 line, which expresses in the 3rd segment, 
to a UAS-TNT-H line (N = 7 and 11, respectively). Statistical analysis for all 
studies used a one-way analysis of variance (ANOVA) with post-hoc multiple 
comparisons of Sidak. Asterisks indicate P < 0.05. For this panel and all other 
quantitative figures, data are means + s.e.m. b—g, Expression pattern of wtrw 
and nan. b, Scanning electron photomicrograph showing Drosophila 
antenna. ¢, nan-GAL4 (also called F-GAL4; ref. 15) and wtrw-GAL4 driving 
UAS-DsRed™""’ overlaid ona differential interference contrast (DIC) image 
of the 3rd segment of the antenna. The fluorescence images represent a stack 
of confocal sections. d, nan-GAL4/UAS-GFP line with fluorescence image in 
left panel, DIC image in middle, and overlay in the right panel. Dendrites of 
nan-expressing neurons at the base of and extending into a probable 
basiconic sensilla (white arrowheads). Inset shows scanning electron 
photomicrograph of a multi-pored basiconic sensilla. e, wtrw-GAL4/UAS- 
CD8-GFP neurons stained with anti-GFP-Alexa 555. Inset shows an 
additional example. f, g, wtrw-GAL4/UAS-GFP (f) or nan-GAL4/UAS-GFP 
(g) co-stained with neuron marker, 22C10. GFP channel, green; 22C10, red. 
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in non-neuronal cells (Supplementary Fig. 4). Their expression in the 
3rd antennal segment and the behavioural screen implicated wtrw 
and nan in sensing humidity and suggested that distinct types of 
sensory neurons express these two TRP genes. 

To test whether cells expressing nan and wtrw are involved in 
hygrosensation, we used nan and wtrw promoters to drive either 
TNT-H, which suppresses synaptic transmission, or a temperature- 
sensitive shibire (shi), which blocks endocytosis and thereby neuro- 
transmission at the non-permissive temperature (29 °C). TNT-Hand 
shi (at 29°C) both impaired hygrosensing behaviour (Fig. 2, and 
Supplementary Fig. 2d), suggesting that cells expressing these genes 
contribute to humidity detection and that developmental changes 
did not cause the behavioural defect. 

The nan deletion mutants, nan*™ and nan” (ref. 15), failed to 
avoid high humidity. Because there are no available wtrw mutations, 
we targeted three different regions of wtrw transcripts with RNA 
interference (RNAi; Supplementary Fig. 5). Driving wtrw®“! 
expression with a promoter expressed in neurons (elav-GAL4) or a 
wtrw promoter (wtrw-GAL4), but not a nan or DEG/ENaC promoter 
(ppk12-GAL4) reduced the preference for dry air (Fig. 2, and 
Supplementary Fig. 2b). Thus, disrupting hygrosensation required 
wtr ‘ expression in specific neuronal cells. As an independent 
method of inhibiting WTRW, we expressed a dominant-negative 
WTRW and found that it too impaired hygrosensing behaviour 
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Figure 2 | Hygrosensing behaviour of promoter-GAL4/UAS-TNT-H, 
mutant, RNAi and dominant-negative lines. Details are as described in the 
Fig. 1 legend. Controls: CS and w!!"° (N = 6 and 16, respectively). Promoter- 
GAL4/UAS-TNT-H lines: driving UAS-TNT-H with nompC-GAL4 (as a 
control), nan-GAL4 and wtrw-GAL4 (N = 6, 12 and 15, respectively). We 
obtained similar results with TNT-E (not shown). Mutations: homozygous 
nan** and nan®” (N = 9 and 6, respectively; Supplementary Information 
footnote 6). RNAi: wtrw-GAL4/+ and elav-GAL4 are shown as controls with 
matched genetic backgrounds (N = 11 and 13, respectively). elav-GAL4, 
nan-GAL4 or wtrw-GAL4 drove ppk12"™™ and wtrw®™“,, as indicated 

(N = 12, 27 10, 13, 9, 11, 14 and 9, respectively). DN: UAS-wtrw?''!9/+ is 
shown as a control (N = 6). elav-GAL4 or wtrw-GAL4 drove UAS-wtrw?N-!? 
(N =9 and 10, respectively). Rescue: wtrw-GAL4 driving UAS-wtrw”! was 
used to rescue the behavioural phenotype of the wtrw deficiency line (no. 
1898, Df(3R)D7, Ubx' e*/TM1) (N = 9 and 10, respectively). nan-GAL4/ 
UAS-nan was used to rescue nan®”’ (N = 12). Asterisk indicates P < 0.05, 
compared to controls. Cross indicates P < 0.05, compared to appropriate 
wtrw Df (deficiency) and nan” lines. 
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(Fig. 2), although we cannot exclude the possibility that the dom- 
inant-negative mutant protein inhibited related TRP channels. We 
rescued the hygrosensing defect in the wtrw deficiency and nan””” 
mutation lines by expressing wild-type wtrw and nan, respectively 
(Fig. 2). nan*™, wtrw-GAL4/UAS-wtrw®*“), and wtrw-GAL4/UAS- 
wtrw?™-1? flies showed normal responses to odorants in the T-maze 
used for humidity sensing (Supplementary Fig. 6; DN, dominant 
negative), and they distributed normally on a temperature gradient 
(15-30 °C, not shown). These results indicate that wtrw and nan have 
an important role in hygrosensing. An earlier report indicated that 
the TRP channel inactive (IAV) forms a complex with NAN and that 
both are required for normal hearing’'. Therefore, we tested an iav 
mutant and found that it also altered hygrosensing (Supplementary 
Fig. 7). 

To assess activity of hygrosensing neurons, we positioned an extra- 
cellular electrode at different locations in the 3rd antennal segment 
and recorded spiking activity evoked by dry and moist air (Fig. 3a). 
Some neurons responded to dry air and some to moist air (Fig. 3b), 
consistent with reports in several other insect species***. Although 
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Figure 3 | Electrophysiologic response to dry and moist air. a, Diagram of 
recording method. A reference electrode was inserted into the eye, and a 
tungsten electrode was inserted into the 3rd antennal segment or at the base of 
2nd segment. Dry or moist air was delivered to the antenna. b, Recording 
electrode placed in the 3rd antennal segment, and dry and moist air were 
delivered as indicated. c, Recording from an electrode positioned at the base 
of the 2nd antennal segment. Trace was from a HS-GAL4/+ control fly. Dry 
air, moist air, or air at room humidity was delivered as indicated. Bottom 
panels show expanded examples of response to dry and moist air. d, Summary 
of electrophysiologic data. Spikes s' indicates the maximum frequency 
during a 1 s recording during dry or moist air application. The number of flies 
tested was N = 13, 27, 11, 12, 12, 18, 11 and 9, respectively. e, Examples of 
responses to moist and dry air from CS, wtrw deficiency (Df(3R)D7,Ubx' 
e*/TMI crossed to wills), wtrw deficiency rescued with wtrw-GAL4/UAS- 
wtrw"!, elav-GAL4/UAS-wtrw®N4?°, wtrw-GAL4/UAS-wtrw?'”?, 
homozygous nan”, and nan” rescued with nan-GAL4/UAS-nan. 
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we recorded at the base of the arista on many occasions, we could 
detect no response to changing humidity (Supplementary Infor- 
mation footnote 3). Thus, Drosophila possess distinct receptor cells 
for high and low humidity. 

Our success rate in obtaining recordings from individual moisture- 
sensitive neurons was <10%, making this method unsuitable for 
quantifying the response of mutants. As an alternative approach, we 
recorded at the base of the 2nd antennal segment. Because all 3rd 
segment sensory neurons project through the 2nd segment to the 
brain, an electrode placed there should record most of the activity 
generated in the antenna. Both moist and dry air generated strong 
responses (Fig. 3c). Moist air initiated an immediate response that 
abated over 5-15 s. Dry air induced a slower increase in frequency over 
~5s. Ablating the arista had no acute effect, although with time, 
responses to moist and dry air disappeared, consistent with an injury 
(Supplementary Fig. 8). 

The wtrw deficiency, RNAi and dominant-negative lines impaired 
the electrophysiological response to moist air, while leaving dry-air 
sensing unaffected (Fig. 3d, e). In contrast, nan? and nan*™ mutants 
eliminated the response to dry air, yet left intact activity to moist air. 
Moreover, transgenic wtrw and nan expression rescued the defective 
responses of the wtrw deficiency and nan®”, respectively. These data 
identify distinct roles for wtrw and nan; one is involved in sensing 
moist air and the other in sensing dry air. 

The fly central nervous system is segregated by specific modalities, 
with sensory neurons that detect specific stimuli projecting to the 
same primary brain centre, independent of their body position”. 
Thus, the central projection of a sensory neuron can suggest its mod- 
ality. In several insects, the olfactory centre is in the antennal lobe, the 
primary centre for taste is located in the suboesophageal ganglia, and 
the mechanosensory area is associated with the dorsal lobe (Fig. 4a)”*. 
To obtain additional clues about the sensory modality involved in 
hygrosensing, we traced the central projections of nan and wtrw 
neurons. Because the dorsal lobe is not well defined in Drosophila, 
we first examined the projection pattern of nompC, a gene associated 
with mechanosensation in flies”. This pattern showed that antennal 
nerve termini projected locally close to but not into the antennal lobe 
(Fig. 4b), a position associated with mechanosensory neurons in 
several other insects and called the antennal mechanosensory and 
motor centre (AMMC)”. The antennal nerve termini labelled in 
nan-GAL4/UAS-CD8-GFP and wtrw-GAL4/UAS-CD8-GFP (CD8 
is a lymphocyte marker) lines projected to a brain region similar to 
that marked by nompC-GAL4/UAS-GFP (Fig. 4c, d). The patterns for 
nan and wtrw were also similar to projections reported in cockroach, 
in which hygroreceptor axons lie at the margin of the antennal lobe”. 
Although these studies cannot determine whether the nan- and wtrw- 
expressing neurons arose from the 3rd antennal segment or from 
other sites, such as Johnston’s organ in the legs or 2nd antennal 
segments, they do indicate that these neurons did not project to 
the antennal lobe or suboesophageal ganglia. In addition, when we 
labelled 3rd antennal segment neurons with a tracer (Dil) in wtrw- 
GAL4/UAS-GFP flies, we found nerve terminals in the AMMC region 
showing co-localization of Dil and GFP (Supplementary Fig. 9). 

Our results raise interesting questions and speculations. How might 
these TRP channels sense humidity? The channels themselves prob- 
ably do not directly sense humidity, because electrolyte-containing 
solutions are required on the extracellular surface for channel func- 
tion. Rather, we speculate that they function as mechanosensors or 
thermosensors or modify the receptor’s function. First, mechanical 
stimulation can activate hygroreceptor cells in other insects (Sup- 
plementary Information footnotes 4 and 5)°°. Second, the appear- 
ance of their dendrites indicates wtrw-expressing neurons may be 
mechanosensory; mechanosensory neurons often have short den- 
drites that attach to the base of hairs via a dendritic sheath”. Third, 
wtrw- and nan-expressing neurons project to brain regions associated 
with mechanosensory function”. Fourth, wtrw belongs to the TRPA 
subfamily, some of which are gated by temperature changes". 
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What is the logic in constructing a sensory system comprised of 
two distinct sets of receptors and cells, one set that detects an increase 
and another that detects a decrease in an environmental signal? Our 
data indicate that both sensors are required; disrupting either the 
moist or the dry receptors impaired the behavioural response to 
humidity. Opposing receptors would create a system poised to 
respond to changes in either direction. In the hygrosensing system, 
this would allow flies to detect subtle differences in humidity with 
greater sensitivity than absolute values of humidity. Consistent with 
this idea, hygrosensing cells displayed minimal basal activity, but 
responded to humidity changes. This arrangement differs signifi- 
cantly from some senses, such as olfaction and gustation. However, 
hygrosensing may not be the only sensory system with this design. 
Thermosensation might share a similar organization, because dis- 
tinct cells respond to heating and to cooling”, and TRP channels 
with varying temperature sensitivity have been reported’. The 
organization of humidity- and temperature-sensing systems may 
have evolved to allow organisms to identify environments between 
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Figure 4 | Central nervous system projection pattern of nompC-GAL4, nan- 
GAL4 and wtrw-GAL4 driving UAS-GFP. Images show frontal view of the 
adult central nervous system; they are all stacks of confocal images. a, Brain 
labelled by nc82, a mouse monoclonal antibody that labels neuropil”. The 
stereotypical glomeruli of the antennal lobe (blue dashed line), mushroom 
body (MB), suboesophageal ganglia (SOG), and ventro-lateral procerebrum 
(VLP) are indicated. An area associated with mechanosensation, the AMMC 
(white oval), is also shown. b, nompC-GAL4/UAS-GFP showing the 
projection pattern in brain. Note the varicosities of the axon termini within 
the AMMC. c, nan-GAL4/UAS-mCD8-GFP showing a view of whole brain, 
with nc82 staining in red. Yellow arrows indicate nerve tract from 
chordotonal organ in the leg (Supplementary Information footnote 7). White 
arrowheads indicate local interneurons that may be involved in other 
functions’. Note that the GFP-labelled (green) projections remain locally in 
the AMMC. d, wtrw-GAL4/UAS-mCD8-GFP co-stained with nc82. Note 
projections around the antennal lobe in the AMMC. Although it seems that 
some of the wirw-expressing interneurons might have projected into the 
AMMC, this was not the case when we examined individual confocal sections. 
We have examined many samples to assess this. 
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extremes of a physical property, whereas in olfaction and gustation a 
ligand primarily attracts or repels an organism. Thus, construction of 
the hygrosensing system with opposing receptors may allow the 
organism to sensitively detect environmental humidity differences 
critical for mating and survival. 

Finding TRP channels as the first candidate hygrosensors in any 
animal species paves the way to understand better this interesting 
sensory modality. It may also offer new approaches to elucidating the 
design of sensory systems. 


METHODS SUMMARY 


See Supplementary Information for detailed Methods and Materials. 

Cloning water witch and generation of transgenic fly lines. Water witch (wtrw) 
was cloned by designed primer sets flanking the predicted CG31284 messenger 
RNA in Flybase (http://flybase.bio.indiana.edu/). A 2 kb genomic DNA sequence 
that was 5’ to the translational start site of wtrw was cloned into a GAL4 PTGAL 
vector and used to make transgenic flies. Transgenic RNAi flies were made using 
the SympUAST-w vector using wtrw sequence from 5’ 206 to 804 bp (RNAil), 
2,258-2,359 (RNAi2) or 2,468—2,592 (RNAi3). To produce a WTRW dominant 
negative (DN)**, we generated a transgene expressing the amino terminus of 
WTRW including the first predicted transmembrane domain using the pUAST 
vector. To rescue the effects of the wtrw deficiency, the full-length wtrw com- 
plementary DNA was cloned into the pUAST and pCaSpeR-hs-act vectors. 
Behavioural assay. Hygrosensory behavioural assays were done on the basis of 
previously described methods, with some modification’. Briefly, in a trial, 20-50 
flies were introduced into a T-maze through an elevator. One tube received moist 
airflow with ~100% humidity and the other received dry air flow at ~0% 
humidity. Most of the flies had made a choice and stopped motion in 5 min. 
Electrophysiological recording. We used an extracellular recording technique 
(see Fig. 3a for a diagram). Briefly, living female flies were placed into a cut plastic 
pipette tip, with the fly head protruding from the tip. Dry or moist air was blown 
from 10 mm in front of the fly head. A sharp tungsten electrode was placed in the 
3rd segment of the antenna or at the base of the 2nd segment. A glass reference 
electrode filled with 0.1 M KCI was positioned in one eye. Voltage differences 
between the reference and recording electrodes were amplified by an Axopatch- 
1D (Axon Instruments). To analyse spike frequency, we used Clampfit 9.0 soft- 
ware (Axon Instruments). The amplitude threshold was set at 2.5 times the 
baseline noise level and the number of spikes was counted automatically. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 
Fly maintenance and stocks. Drosophila stocks were reared on standard corn- 
meal-agar-molasses medium at constant 25°C with a 12h light and 12h dark 
cycle. The GH86-GAL4 line was a gift from G. Heimbeck. UAS- TNT-H was a gift 
from C. J. O’Kane. UAS-shi"' was a gift from T. Kitamato. Pyr’ and pyr’;Ge* were 
gifts from J. Kim. All the deficiency lines for TRP and aquaporin genes and elav- 
GAL4 lines were obtained from the Bloomington Stock Center. The deleted 
chromosomal regions in these deficiency lines were based on Flybase and the 
Stock Center’s database information. 
Cloning water witch and generation of transgenic fly lines. We found that the 
DNA sequences of introns and exons were the same as those predicted for 
CG31284. The protein-coding sequence for wtrw was 2,958 bp. yw”? strains 
were used for transgenic injections. P-element-mediated transformation and 
subsequent fly crossings were performed following standard techniques. A 
2kb genomic DNA sequence that was 5’ to the translational start site of wtrw 
was amplified by PCR from genomic DNA purified from wild-type adult flies. 
This 2 kb DNA sequence should contain most of the promoter activity of wtrw 
and was cloned into a GAL4 PTGAL vector (a gift from D. F. Eberl). We gener- 
ated several independent P-element-mediated transgenics from this PTGAL 
vector. We studied UAS-wtrw®X4"-? and UAS-wtrw®%4""? on the X chro- 
mosome and UAS-wtrw* An UAS-wtrw® eee UAS-wirw®S43! on the 2nd 
or 3rd chromosome. To produce a WTRW dominant negative (DN)**, we gen- 
erated a transgene expressing the N terminus of WTRW including the first 
predicted transmembrane domain (predicted protein size is 653 amino acids) 
using the pUAST vector. We studied UAS-wtrw?? on the 2nd chromosome. 
To rescue the effects of the wtrw deficiency, the full-length wtrw cDNA was 
cloned into the pUAST and pCaSpeR-hs-act vectors (GenBank accession num- 
ber: U60735). UAS-wtrw™! inserted on the 2nd chromosome. Because the 
pCaSpeR-hs-act vector has a heat shock 70 promoter in front of the inserted 
cDNA, it will express wtrw mRNA after heat shock in the transgenic line. To 
express the hs-wtrw~” transgene (located on the 3rd chromosome), we placed the 
flies in a 37 °C incubator for 3 h a day for 3 days. 
Immunohistochemistry. Adult heads were dissected from the progenies of 
wtrw-GAL4/UAS-mCD8-GEP and nan-GAL4/UAS-mCD8-GFP crosses. Adult 
heads were sectioned following a previously described protocol*®. Anti-GFP 
rabbit polyclonal serum (1:5,000, Invitrogen, Molecular Probes) and the 
monoclonal antibody 22C10 (1:500, Developmental Studies Hybridoma Bank, 
University of Iowa) were used. Alexa Fluor 488 goat anti-rabbit IgG (1:1,000, 
Invitrogen, Molecular Probes) and cyanine 3 (Cy3)-conjugated goat anti-mouse 


nature 


IgG (1:500, Jackson ImmunoResearch) were used as secondary antibodies. The 
staining was observed with a confocal microscope (BioRad). 

Behavioural assay. At the point when flies had made a decision regarding pre- 
ference of humid or dry air, the elevator was raised, so that flies could not change 
sides. We then counted the number of flies on each side and calculated the 
percentage on the moist side. For each genotype, we performed 6-27 such trials. 
To determine the average percentage of flies on the moist side, we calculated the 
mean and s.e.m. for the trials. Between 12% and 30% of our wild-type flies 
distributed to the moist side, a percentage point higher than a previous report’. 
This difference might have been due to different genetic backgrounds or other 
laboratory variables. To reduce variability, we avoided experiments in bad 
weather conditions and between 11:00 and 16:00, because flies were generally 
less active at those times. 

Electrophysiological recording. Data were recorded using p-Clamp7.0 soft- 
ware; the sampling interval was 100 ps. 

To deliver moist or dry air, a single source of air at a constant flow rate was 

switched to pass through either a stone aerator in a water-filled flask or an aerator 
ina flask filled with drierite beads. It then emerged from tubes positioned 10 mm 
in front of the fly’s head. We measured relative humidity and temperature with a 
digital humidity and temperature meter (Control Company). Experiments were 
performed at room temperature, and no temperature changes were detected with 
interventions, suggesting the temperature changes were less than 0.1 °C. To test 
the effect of airflow alone, we did experiments on days when the relative humi- 
dity was ~20%, because the relative humidity of the compressed air source was 
~20%. We used the same tubing setup but left the aerators out of the water and 
the drierite beads. We observed no response to the airflow alone. 
Confocal imaging. To investigate the peripheral expression pattern and central 
projections of various genes, we used their promoters to generate promoter- 
GAL4 lines and crossed them to UAS-CD8-GFP (encoding a fusion protein 
between mouse lymphocyte marker CD8 and GFP that marks cell surface mem- 
branes and is concentrated in neuronal processes). Antenna or dissected adult 
brain was fixed in 4% paraformaldehyde in PBS for 30 min. Then, they were 
stained with rabbit anti-GFP-Alexa 555 (Invitrogen) to detect mCD8-—GFP; 
nc82—a mouse monoclonal antibody that labels neuropil*”—or fluorescence 
of the GFP-based fluorophore was viewed directly. Confocal images were 
obtained with an LSM 510 META NLO (BioRad). 


30. Wolff, T. in Drosophila Protocols (eds Sullivan, W., Ashburner, M. & Hawley, R. S.) 
229-234 (Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 2000). 
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Roquin represses autoimmunity by limiting inducible 
T-cell co-stimulator messenger RNA 


Di Yu’, Andy Hee-Meng Tan”, Xin Hu’, Vicki Athanasopoulos’”, Nicholas Simpson’, Diego G. Silva’, 
Andreas Hutloff*, Keith M. Giles’, Peter J. Leedman®®, Kong Peng Lam’, Christopher C. Goodnow’”* 


& Carola G. Vinuesa!* 


Immune responses are normally targeted against microbial patho- 
gens and not self-antigens by mechanisms that are only partly 
understood. Here we define a newly discovered pathway that pre- 
vents autoimmunity by limiting the levels on T lymphocytes of a 
co-stimulatory receptor, the inducible T-cell co-stimulator 
(ICOS). In sanroque mice homozygous for an M199R mutation 
in the ROQ domain of Roquin (also known as Rc3h1)', increased 
Icos expression on T cells causes the accumulation of lymphocytes 
that is associated with a lupus-like autoimmune syndrome. 
Roquin normally limits Icos expression by promoting the degra- 
dation of Icos messenger RNA. A conserved segment in the unusu- 
ally long ICOS 3’ untranslated mRNA is essential for regulation by 
Roquin. This segment comprises a 47-base-pair minimal region 
complementary to T-cell-expressed microRNAs including miR- 
101, the repressive activity of which is disrupted by base-pair 
inversions predicted to abrogate miR-101 binding. These findings 
illuminate a critical post-transcriptional pathway within T cells 
that regulates lymphocyte accumulation and autoimmunity, and 
highlights the therapeutic potential of partially antagonising the 
ICOS pathway. 

A two-signal mechanism regulates T-cell responses in secondary 
lymphoid tissues”, whereby T-cell receptor engagement by antigen— 
major histocompatibility complex on an antigen-presenting cell 
(APC) only triggers T-cell accumulation and effector functions when 
a second co-stimulatory receptor on the T cell, CD28, is simulta- 
neously engaged by B7 proteins that are induced on the APC on 
exposure to microbes. ICOS, a CD28 paralogue, can provide co- 
stimulation for T-cell responses in the absence of CD28 and is critical 
for follicular helper T (Tpy;) cell survival, germinal centre reactions 
and generation of B-cell memory”. The ligand for ICOS (ICOSL), 
unlike B7, is expressed constitutively on many APCs in the absence of 
microbe components, raising a paradox about how autoimmunity is 
avoided in the face of this second co-stimulatory system. 

The sanroque M199R mutation in Rc3hl (Rce3h 1") causes 
lupus associated with lymphadenopathy, splenomegaly and 
accumulation of T cells expressing increased Icos levels'. To test 
the contribution of increased Icos expression to the pathology of 
Re3h1"""""" mice, the Icos gene dosage was halved by interbreeding 
with Icos knockout mice. CD4* T cells from Re3h1%""""" Icos*'~ mice 
expressed 70% less Icos than cells from Re3h1@""" Icost'* mice, 
although this was still double that of wild-type mice (Fig. la and 
Supplementary Fig. 2). Partial correction of Icos overexpression in 
Re3h1"""" Icos‘’ mice was accompanied by a parallel reduction of 


lymphadenopathy, splenomegaly, total T- and B-cell numbers, Tpy 
cell expansion and germinal-centre B-cell numbers in sanroque mice 
(Fig. 1b-e and Supplementary Fig. 3a, b). This effect is unlikely to 
reflect a nonspecific reduction in co-stimulation because halving 
the gene dose of Cd28 in sanroque mice using a similar strategy did 
not reduce spleen nor lymph node size (Supplementary Fig. 4a, b). 
Furthermore, the severity of the lymphadenopathy correlated closely 
with the levels of Icos expressed on naive T cells across the different 
groups of genetically manipulated mice (Supplementary Fig. 4c). 
These results indicate that Icos overexpression caused by Roquin 
(M199R) is an essential contributor to the lupus phenotype, and 
demonstrate that tight regulation of Icos expression by Roquin is 
crucial to prevent T- and B-cell accumulation. 

Because ectopic expression of Roquin in CD4* T cells reduces 
endogenous Icos protein expression’, we tested whether it regulated 
endogenous Icos mRNA abundance by overexpressing Roquin in 
stimulated EL4 T cells. In cells expressing wild-type Roquin, the 
quantity of Icos mRNA was halved compared to cells transfected with 
empty vector, whereas Roquin(M199R) was a less potent repressor of 
IcosmRNA (Fig. 2a). Manual annotation of ICOS mRNA bya BLAST 
search of the expressed-sequence-tag database revealed a remarkably 
long ~2,000 base pair (bp) 3’ untranslated region (UTR) containing 
six highly conserved segments (Fig. 2b and Supplementary Table 1). 
To test the role of the ICOS 3’ UTR, Roquin was expressed in 
NIH3T3 cells using the pR-IRES-GFP retroviral vector together with 
a pR-IRES-CD4 retroviral vector expressing either full-length human 
ICOS complementary DNA (I cost ) or human ICOS complement- 
ary DNA lacking the 3’ UTR (ICOS** 78; Fig. 2c, d). Although 
human ICOS expressed from ICOS'" was strongly repressed in green 
fluorescent protein (GFP)* cells co-expressing wild-type Roquin, 
there was negligible repression when human ICOS was expressed 
from ICOS*'V'® (Fig. 2e). Using GFP to determine relative 
Roquin levels in individual cells, repression of human ICOS expres- 
sion by wild-type Roquin was shown to be dose-dependent (Fig. 2f 
and Supplementary Fig. 5). 

We took advantage of the ICOS retroviral vector, which constitu- 
tively expresses a single bicistronic mRNA encoding ICOS and 
human CD4 (Fig. 2d), to confirm that the ICOS 3’ UTR was the 
target of Roquin’s repressive effect on the transcript. Provided 
ICOS 3' UTR sequences were present within the bicistronic 
mRNA, wild-type Roquin caused a dose-dependent repression of 
human CD4 expression (Fig. 2g; Supplementary Fig. 6). Notably, 
Roquin(M199R) was a less-efficient repressor of human ICOS 


'Division of Immunology and Genetics, John Curtin School of Medical Research, The Australian National University, Canberra, 2601, Australia. “Laboratory of Molecular and Cellular 
Immunology, Biomedical Sciences Institute, Agency for Science, Technology and Research (A*STAR), Singapore 138673, Singapore. ARC Centre for the Molecular Genetics of 
Development, Australian National University, Canberra, 2601, Australia. *Molecular Immunology, Robert Koch-Institute, 13353, Berlin, Germany. °Laboratory for Cancer Medicine, 
The University of Western Australia Centre for Medical Research, Western Australian Institute for Medical Research, Perth, 6000, Australia. °School of Medicine and Pharmacology, 
The University of Western Australia, Perth, 6000, Australia. ’Australian Phenomics Facility, Canberra, 2601, Australia. 


*These authors contributed equally to this work. 


299 


©2007 Nature Publishing Group 


LETTERS 


expression (Supplementary Fig. 7), indicating that the sanroque 
mutation reduces but does not abolish this activity of Roquin. 

To demonstrate that this mechanism operates in mouse primary T 
cells expressing endogenous Roquin, ICOS™, ICOS*?'V'™ or the 3’ 
UTR alone of human ICOS (ICOS* “'®) were inserted into the pk- 
IRES-GFP vector (Fig. 2d). Mean GFP levels were higher in sanroque 
CD4* T cells compared to wild-type CD4* T cells transduced with 
constructs containing the ICOS 3’ UTR, regardless of the presence or 
absence of the ICOS coding region (Fig. 2h). In contrast, the sanroque 
mutation did not affect GFP levels from the construct lacking the 3’ 
UTR (ICOS** UTS, Fig, 2h). 

Given the regulatory role of 3’ UTRs in mRNA stability and 
Roquin’s localization to cytoplasmic stress granules', from which 
some mRNAs are transported to processing bodies where mRNA 
degradation occurs'®, we next asked if ICOS mRNA is localized to 
stress granules and if its stability is regulated by Roquin. We visua- 
lized human ICOS mRNA in transduced cells using the MS2 plasmid 
expression system (Supplementary Fig. 8) and found ICOS'’ mRNA 
localized within stress granules and processing bodies (Fig. 3a, b). 
The rate of Icos mRNA decay was measured in stimulated EL4 cells 
transfected with Roquin or empty vector after inhibiting transcrip- 
tion. Roquin overexpression shortened the half-life of endogenous 
Icos mRNA to 46% of the empty vector control (Fig. 3c). 

To identify the cis-acting regulatory elements within the ICOS 3' 
UTR that mediate Roquin’s action, we introduced three human 
ICOS 3' UTR fragments of similar size—3’ UTR-F1, 3’ UTR-F2 
and 3’ UTR-F3, each containing at least one conserved region 
(Fig. 2c)—into the pR-IRES-CD4 vector (Fig. 2d) and measured 
the effect of co-expressing Roquin (Fig. 3d). Wild-type Roquin did 
not repress expression of human CD4 mRNA containing either 3’ 
UTR-F1 or 3’ UTR-EF2, but significantly reduced expression of 
human CD4 mRNA containing 3’ UTR-F3. This activity of 3’ 
UTR-F3 was diminished by the M199R mutation. The inhibitory 
effect of the 3’ UTR-F3 on mRNA abundance was confirmed by 
real-time polymerase chain reaction (PCR) measurements in sorted 
cells (Fig. 3e). In cells transduced with 3’ UTR-F3-CD4, but not 
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control cells with 3’ UTR-F1-CD4 or empty human CD4 vector 
(not shown), sorted GFP" cells expressing exogenous wild-type 
Roquin had 70% less human CD4 mRNA than GFP™' control cells 
in the same culture (Fig. 3e). 

Although AU-rich elements are the best characterized cis-acting 
elements for mRNA degradation”, there is no typical AU-rich ele- 
ment within [COS 3’ UTR-F3. Recent work has shown that 
microRNAs (miRNAs), a class of small non-coding RNAs of 20-22 
nucleotides, base-pair with cis-regulatory sites within the 3’ UTRs of 
target mRNAs and mediate post-transcriptional repression of gene 
expression”. In silico analysis of ICOS 3'UTR-F3 using MiRanda’*”° 
revealed several putative miRNA target sites (Supplementary Fig. 9a, 
b). MicroRNA target region one (MTR1), a 47 bp region containing 
target sequences for miR-101, 103/107 and 338, and miRNA target 
region two (MTR2), a 47 bp region containing the target sequence for 
miR-149, were subcloned into the pR-IRES-CD4 retroviral vector 
(Fig. 2d). MTRI but not MTR2 was able to mediate repression of 
human CD4 expression by wild-type Roquin but not by Roquin 
(M199R) (Fig. 4a). 

Because miR-101 is expressed in NIH3T3 cells and mouse T lym- 
phocytes”’, we tested whether complementarity between MTRI and 
miR-101 is required for repression by determining the activity of a 
mutant construct (MTRIM") containing a two-nucleotide inversion 
within a region of MTRI that is perfectly complementary to the 
so-called ‘seed’ of miR-101 (Fig. 4b). Complete pairing within the 
seed sequence is essential for miRNA recognition of its target 
sequence’, Mismatches within the seed region inhibited MTR1- 
mediated repression of human CD4 expression by Roquin (Fig. 4b). 
Ectopic expression of miR-101 precursor in EL4 cells reduced endo- 
genous Icos mRNA levels (Fig. 4c). Consistent with a role in repres- 
sing ICOS, we found that miR-101 is differentially expressed in 
human naive, memory and Tp CD4* cells (Fig. 4d), and inversely 
correlates to ICOS expression in these subsets”, with the highest 
levels found in naive (ICOS"®) cells, intermediate levels in memory 
cells (ICOS™) and lowest levels in Ty cells (ICOS™") (Fig. 4d). The 
levels of miR-101 and its primary miRNA transcript were higher in 
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Figure 1| Overexpression of Icos contributes to autoimmunity in sanroque 
(Re3hT5"“52") mice. a, Icos expression on naive (CD44!°™, left) and antigen- 
experienced (CD44"®", right) splenic CD4* T cells from 8-week-old 

C57BL X CBA H-2"* mice with the indicated genotypes. b, Spleen (top) and 
lymph nodes (bottom) from representative mice. c, Weights of spleen and 
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combined lymph nodes. d, Total numbers of B cells (B220*) and CD4* T 
cells from spleens in c. e, Total numbers of germinal-centre B cells and Ty 
cells from spleens in c. Each symbol represents one mouse. The lines show 
arithmetic means. Significant differences by Student’s t-test are marked by 
asterisks: one asterisk, P < 0.05; two asterisks, P< 0.01. 


©2007 Nature Publishing Group 


NATURE] Vol 450|8 November 2007 


sanroque naive CD4* T cells than in the wild type (Supplementary 
Fig. 10), indicating that the M199R mutation of Roquin does not 
dysregulate Icos mRNA by means of a loss of miR-101 expression. 
The paradoxical increase in miR-101 indicates a compensatory upre- 
gulation. MiR-103 and miR-338 are both also predicted to recognize 
MTRI; ectopic expression of the miR-103 precursor, but not the 
miR-338 precursor, also reduced endogenous Icos mRNA levels 
(Supplementary Fig. 11). It is thus possible that several miRNAs 
act together regulate Icos mRNA decay. 

We next asked whether defective miR-101-mediated repression 
could explain increased expression of other genes in sanroque T cells’. 
Messenger RNA levels of neuropilin 1, important for the formation 
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of an ‘immunological synapse’ between T cells and dendritic cells”°, 
are increased in sanroque naive CD4* T cells, and neuropilin 1 is a 
predicted target of miR-101 (Supplementary Fig. 12a, b). Over- 
expression of wild-type Roquin in EL4 T cells reduced neuropilin 1 
mRNA compared to cells transfected with the empty vector. In con- 
trast, Roquin(M199R) increased the levels of neuropilin 1 mRNA 
(Supplementary Fig. 12c). As shown for Icos, ectopic expression of 
miR-101 also reduced endogenous neuropilin 1 mRNA levels 
(Supplementary Fig. 12d). 

The findings reported here demonstrate a newly discovered mRNA 
regulatory pathway, the partial failure of which leads to autoimmu- 
nity. Autoimmune lymphocyte accumulation in mice homozygous 
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Figure 2 | Roquin represses ICOS through sequences in the [COS mRNA 3’ 
UTR. a, Endogenous Icos mRNA measured by real-time PCR with reverse 
transcription (RT-PCR) in EL4 cells transfected with empty vector, wild-type 
Roquin or Roquin(M199R) and then stimulated with anti-CD3« plus anti- 
CD28. The level of Icos mRNA in cells transfected with empty vector was 
assigned 100%. b, Conservation between full-length human and mouse Icos 
cDNA. Shaded areas represent conserved regions with more than 70% 
homology in 100 bp. CDS, coding sequence. ¢, The human ICOS cDNA 
fragments used. d, Schematic maps for retroviral vectors. CD4, human tailess 
CD4 cDNA; IRES, internal ribosome entry sequence; LTR, long terminal 
repeat; ‘Y, packaging signal; MCS, multiple cloning site. e, Flow cytometric 


plots showing GFP and human ICOS expression on NIH3T3 cells transduced 
with the indicated ICOS vectors plus either Roquin or the empty (GFP-only) 
vector. The boxes show the gates used to define the GFP", GFP’ and GFP™! 
populations. f, g, Mean fluorescent intensity (MFI) of human ICOS (f) and 
human CD4 (g) in the populations gated in e. Repression of ICOS and CD4 
was quantified as explained in Supplementary Fig, 5. h, Activated CD4* 
sanroque CD45.2 and wild-type CD45.1 cells were transduced with 
retroviruses packaged from pR-IRES-GFP vectors expressing human ICOS 
cDNA fragments, and GFP MFI was quantified. Data shown represent mean 
values + s.d., with n = 3. Significant differences by Student’s t-test are marked 
by asterisks: one asterisk, P < 0.05; two asterisks, P< 0.01. 
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for Roquin(M199R) results at least in part from failure to repress 
expression of the co-stimulatory molecule ICOS, particularly on 
naive T cells. ICOS expression is normally tightly regulated post- 
transcriptionally by a mechanism that involves a conserved miRNA 
binding sequence within the 3’ UTR. Whether Roquin binds directly 
to its target mRNAs and/or to the complementary miRNAs is not 
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Figure 3 | Identification of the minimal region within ICOS 3’ UTR 
containing cis-acting elements for Roquin control of [COS mRNA 
abundance. a, b, Intracellular localization of human ICOS mRNA in 293T 
cells co-transfected with plasmids expressing ICOS' (pR-ICOS™-MS2 
binding site(.4)-IRES-CD4), TIA-1-DsRed (a, a stress granule marker) or 
DCP 1a-DsRed (b, a processing-body marker), and MS2— YFP—NLS. Green 
fluorescence shows the localization of [COS transcripts bound to MS2-YFP 
protein. Arrows indicate stress granules (a) and processing-bodies (b). ¢, Icos 
mRNA levels in activated EL4 cells transfected with Roquin or empty vector, 
treated with actinomycin D for the times indicated. Endogenous Icos mRNA 
levels were measured using real-time RT-PCR and normalized to Actb. The 
amount of Icos mRNA at time 0 h was assigned 100%. Trendlines (Graphpad 
Prism) were fitted to predict the indicated mRNA half-lives. Data shown are 
mean values + s.d. with n = 3. d, MFI of human CD4 in NIH3T3 cells co- 
transduced with vectors expressing ICOS 3' UTR-F1, -F2 or -F3 upstream of 
IRES-CD4, and vectors expressing either wild-type Roquin or 
Roquin(M199R) upstream of IRES-GFP. Data shown are mean values + s.d. 
with n = 3. Statistically significant differences are marked by asterisks: 
Student’s t-test, two asterisks, P< 0.01. e, Human CD4 mRNA levels in 
NIH3T3 cells co-transduced with bicistronic retroviral vectors containing 
ICOS 3' UTR-F1- or 3’ UTR-F3-CD4 and Roquin-GFP. GFP™ and GFP" 
populations were FACS-sorted using the gates shown in the contour plots 
(left), and total RNA was isolated. The bar graph shows relative human CD4 
mRNA in Roquin™®* (GEP*') cells compared to Roquin” il (GEP™) cells. The 
small increase in human CD4 mRNA in ICOS 3' UTR-F1 cells expressing 
Roquin was also observed in GFP" cells expressing empty vector (not 
shown), indicating that this effect is not due to Roquin, but probably reflects 
enrichment for cell cycling among double-transduced cells. Data shown 
represent mean values + s.d. with n = 4 different primer pairs designed to 
amplify different regions of CD4 cDNA. 
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known. Roquin contains a potential RNA-binding CCCH domain 
and localizes within stress granules, in close association with proces- 
sing bodies, leading us to speculate that Roquin forms part ofa multi- 
protein complex within stress granules that directs certain mRNAs, 
including ICOS, to the route of miRNA-mediated decay in proces- 
sing bodies. There are precedents for proteins that regulate miRNA- 
mediated repression by directing localization of target mRNAs 
to specific cytoplasmic compartments’’. Notably, a number of 
human and mouse autoantibodies target key elements of the RNA 
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Figure 4 | A miRNA target site mediates the regulation of [cos mRNA by 
Roquin. a, MFI of CD4 assessed by flow cytometry in NIH3T3 cells 
retrovirally co-transduced with vectors expressing MTR1 or MTR2 
upstream of IRES-CD4, and wild-type Roquin or Roquin(M199R) upstream 
of IRES-GFP. b, The human miR-101 binding site within ICOS 3’ UTR, as 
predicted by miRanda. MTR1™™ was constructed by reversing two adjacent 
nucleotides (box) in the predicted ‘seed’ sequence within MTR1. The bar 
graph shows the MFI of CD4 in NIH3T3 cells transduced with wild-type 
Roquin (GFP) and MTRI™" (human CD4) analysed as described in 

a. c, Endogenous Icos mRNA levels assessed by real-time RT-PCR in EL4 
cells transfected with miRNA precursor negative control and human miR- 
101 precursor for 24h. The level of Icos mRNA in cells transfected with 
miRNA precursor negative control was assigned 100%. d, Levels of mature 
miRNA-101 in human naive (CD4*CD45RO CXCR5 ICOS ), memory 
(CD4* CD45RO*CXCR5 °*ICOS™) and Try (CD4*CD45RO* 
CXCR5"8"1COS"®") CD4* cells sorted from tonsil were quantified using 
TaqMan miRNA assays. Data shown are mean values = s.d. with n = 3. Each 
panel represents a separate experiment from different individuals. 
Statistically significant differences are marked by asterisks: Student’s t-test, 
one asterisk, P < 0.05; two asterisks, P< 0.01; NS, no significant difference. 
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interference machinery, including argonaute proteins and Dicer™. It 
is intriguing that stress granules and their other components, notably 
TIA-1 (T-cell restricted intracellular antigen-1) are more prominent 
in primed T cells”. Taken together, this raises the possibility that 
dysfunction of the machinery for RNA decay and interference lies 
at the core of the pathogenesis of autoimmune diseases. The findings 
here demonstrate a unique mechanism to prevent autoimmunity by 
limiting T-cell receptivity to co-stimulation. This contrasts with, and 
probably complements, the mechanisms that control expression on 
APCs of B7 co-stimulatory ligands for CD28. The marked effect of 
quantitative shifts in ICOS expression demonstrated here highlights 
the potential for treating autoimmunity in some individuals by par- 
tial antagonism of ICOS-ICOSL interaction. 


METHODS SUMMARY 

Mice. Sanroque C57BL/6 H-2°? mice were used for in vitro experiments, and 
C57BL X CBA H-2** mice were used for in vivo experiments and crosses with 
Icos ’~ and Cd28-’~ mice. All animal procedures were approved by the 
Australian National University Animal Ethics and Experimentation Committee. 
Cells, retroviruses, transduction and transfection. Fragments of human ICOS 
cDNA were amplified from the BC028210 cDNA clone and were inserted into 
retroviral vectors. Retroviral supernatants were harvested from packaging 
Phoenix cells transfected with individual retroviral constructs. NIH3T3 cells 
and primary CD4* T cells were transduced by retroviruses by spinoculation. 
Roquin cDNA was subcloned into pcDNA3.1/CT-GFP TOPO TA fusion vector, 
which was used to transfect EL4 cells. MicroRNA precursors and the negative 
control (Ambion’s negative control 1) were transfected into EL4 cells using the 
siPORT NeoFX. Plasmids expressing the ICOS'™ (pR-ICOS'-MS2 binding 
site(24)-IRES-CD4), TIA-1-DsRed or DCP1A-DsRed, and MS2—YFP—NLS 
(nuclear localization signal) were co-transfected into 293T cells. 

Flow cytometry and immunofluorescence. Flow cytometry and immunofluor- 
escence were performed as described’. 

Real-time PCR with reverse transcription. Complementary DNA expression 
was determined with the ABI Prism 7900 sequence-detection system and SYBR 
Green reagents. The amount of mRNA was expressed relative to that of B-actin 
(Actb) or Ube2d2 and expressed as 2 —ACr (see Methods). For miRNA assessment, 
human naive, memory and Tpy CD4* cell subsets were sorted from tonsil, and 
RNA was isolated using the mirVana miRNA isolation kit. Mature miRNA levels 
were quantified using the TaqMan miRNA assay. 

Computational methods. Human (BC028210) and mouse (AK030827) ICOS 
nucleotide sequences were aligned using LAGAN (limited area global alignment 
of nucleotides) *°. MicroRNA target sequences within Icos mRNA were predicted 
using the MiRanda software”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. sanroque C57BL/6 H-2°? mice were used for in vivo experiments; for 
in vitro experiments and crosses to Icos /~ and Cd28/~ mice, the test and 
control littermates were a mixed C57BL/6, B10.BR and CBA background. All 
mice were housed in specific pathogen-free conditions in the Australian 
Phenomics Facility. 

Cells, retroviruses, transduction and transfection. NIH3T3 cells were grown in 
DMEM medium supplemented with 10% FCS and antibiotics. Various frag- 
ments of human ICOS cDNA were amplified by PCR from the BC028210 
cDNA clone and were inserted into retroviral vectors. Primer sequences are 
available on request. Retroviral supernatants were harvested from packaging 
Phoenix cells transfected with individual retroviral constructs. NIH3T3 cells 
and primary CD4* T cells were transduced by retroviruses by spinoculation. 

EL4 cells were grown in RPMI medium supplemented with 10% FCS and 
antibiotics. Roquin cDNA was subcloned into the pcDNA3.1/CT-GEP TOPO 
TA fusion vector (Invitrogen), which was then used to transfect EL4 cells. 

MicroRNA precursors and the negative control (number 1) were purchased 
from Ambion and transfected into EL4 cells using the siPORT NeoFX (Ambion). 

293T cells were grown in DMEM medium supplemented with 10% FCS and 

antibiotics. Plasmids expressing the ICOS' (pR-ICOS™-MS2 binding site;24)- 
IRES-CD4), TIA-1-DsRed or DCPla-DsRed, and MS2—YFP—NLS were co- 
transfected into 293T cells. 
Real-time PCR with reverse transcription. RNA was isolated using TRIzol 
reagent (Invitrogen) and reverse-transcribed with oligo(dT) using Superscript 
I RT enzyme (Invitrogen). Complementary DNA expression was determined 
with the ABI Prism 7900 sequence-detection system and SYBR Green reagents 
(PE Biosystems). Primer sequences are available on request. Fluorescence signals 
were measured over 40 PCR cycles, and the cycle at which signals crossed a 
threshold set within the logarithmic phase (Cy) was recorded. The Cy for the 
target gene was subtracted from the Cy for Actb or Ube2d2 (ACy). The relative 
amount of mRNA was calculated as 2~4°. 

To measure levels of individual mature miRNAs, human naive, memory and 

Try CD4* cell subsets were sorted from tonsil using a FACSAria (Becton 
Dickinson) and RNA was isolated using the mirVana miRNA Isolation Kit 
(Ambion). Mature miRNA levels were quantified using the TaqMan miRNA 
assay (Applied Biosystems). 
Activation of EL4 cells and primary CD4* cells. To assess endogenous Icos 
mRNA levels in EL4 cells, these were first transfected with empty vector, wild- 
type Roquin or Roquin(M199R) and were then stimulated with plate-bound 
anti-CD3« (1gml ') plus anti-CD28 (4ugml_') for 16h. For experiments 
assessing Icos mRNA decay in EL4 cells, these cells were transfected with 
Roquin or empty vector, stimulated with plate-bound anti-CD3¢ (1 ug ml ') 
plus anti-CD28 (4g ml~') for 6h and treated with transcriptional inhibition 
reagent actinomycin D (10 ug ml ') for the times indicated. 

For transduction of primary T cells, CD4* T cells were magnetically isolated 
from spleen and lymph nodes of sanroque (CD45.2) and wild-type C57BL/6 
(congenic, CD45.1) mice. Wild-type and sanroque CD4* cells were co-cultured 
and stimulated with plate-bound anti-CD3e (2pgml~') plus anti-CD28 
(5 ug ml~') for 24h before transduction with retroviruses packaged from pR- 
IRES-GFP vector inserted with ICOS cDNA fragments as indicated. 
Computational and statistical methods. The human [COS nucleotide sequence 
(BC028210) was aligned with mouse Icos nucleotide sequence (AK030827) using 
pairwise alignment software LAGAN” through the VISTA server (http://genome. 
Ibl.gov/vista/index.shtml). 

For assessment of mRNA decay after treatment with actinomycin D, tren- 
dlines were fitted using the Graphpad Prism software; these were used to pre- 
dicted mRNA half-lives. 

MicroRNA target sequences within Icos and neuropilin 1 (Nrp1) mRNAs were 
predicted by the MiRanda software’. 
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An epigenetic activation role of Piwi and a 
Piwi-associated piRNA in Drosophila melanogaster 


Hang Yin'* & Haifan Lin’? 


Heterochromatin, representing the silenced state of transcription, 
consists largely of transposon-enriched and highly repetitive 
sequences. Implicated in heterochromatin formation and tran- 
scriptional silencing in Drosophila are Piwi (P-element induced 
wimpy testis)'? and repeat-associated small interfering RNAs 
(rasiRNAs)*>. Despite this, the role of Piwi in rasiRNA expression 
and heterochromatic silencing remains unknown. Here we report 
the identification and characterization of 12,903 Piwi-interacting 
RNAs (piRNAs) in Drosophila, showing that rasiRNAs represent a 
subset of piRNAs. We also show that Piwi promotes euchromatic 
histone modifications and piRNA transcription in subtelomeric 
heterochromatin (also known as telomere-associated sequence, or 
TAS), on the right arm of chromosome 3 (3R-TAS). Piwi binds to 
3R-TAS and a piRNA uniquely mapped to 3R-TAS (3R-TAS1 
piRNA). In piwi mutants, 3R-TAS loses euchromatic histone 
modifications yet accumulates heterochromatic histone modifica- 
tions and Heterochromatin Protein 1a (HP1a). Furthermore, the 
expression of both the 3R-TAS1 piRNA and a white reporter gene 
in 3R-TAS becomes suppressed. A P element inserted 128 base 
pairs downstream of the 3R-TAS1 piRNA coding sequence restores 
the euchromatic histone modifications of 3R-TAS and the expres- 
sion of 3R-TAS1 piRNA in piwi mutants, as well as partly rescuing 
their defects in germline stem-cell maintenance. These observa- 
tions suggest that Piwi promotes the euchromatic character of 
3R-TAS heterochromatin and its transcriptional activity, opposite 
to the known roles of Piwi and the RNA-mediated interference 
pathway in epigenetic silencing. This activating function is prob- 
ably achieved through interaction with at least 3R-TAS1 piRNA 
and is essential for germline stem-cell maintenance. 

Non-coding small RNAs in the nucleus have been proposed to 
provide a sequence-specific interface between a DNA sequence and 
its epigenetic state, presumably by their base-pairing with genomic 
DNA or nascent RNA®’. Recent studies in the fission yeast have 
indicated that RNA-interference (RNAi)-mediated heterochromatin 
assembly occurs by means of a self-enforcing loop mechanism*"’. A 
central player of this loop is the RITS (RNAi-induced initiation 
of transcriptional gene silencing) complex, which contains a 
chromodomain-containing protein (Chp1), Argonaute 1 (Agol), 
Tas3, and siRNAs”'*. Agol confers the sequence specificity by bind- 
ing to siRNAs and recruits other chromatin proteins to initiate the 
heterochromatization’. Despite this progress, the role of non-coding 
small RNAs in epigenetic regulation in higher organisms remains 
largely unexplored. 

To examine this role, we focus on Piwi and its interacting piRNAs 
in Drosophila. Piwi is an Ago/Piwi protein that was initially identified 
to be essential for stem-cell self-renewal'*. Subsequently, it was impli- 
cated in heterochromatin formation, transposon silencing, and clus- 
tering of multiple copies of transgenes through the RNAi-mediated 


pathway'’*'*71°, Piwi interacts with piRNAs*>, bears RNA cleavage 
activity*, and may participate in an ‘amplification cycle’ that accel- 
erates piRNA biogenesis”’”. 

To identify Piwi-interacting piRNAs systematically, we conducted 
immunoprecipitation to purify the Myc-Piwi complex from ovaries 
of adult flies carrying a fully functional myc—piwi transgene 
(Supplementary Fig. 1a)'*. Small RNAs ranging from 18 to 32 nucleo- 
tides (nt) in length were specifically precipitated with the Myc—Piwi 
complex (Fig. la, in which 24—26-nt RNAs are visible). We recovered 
19,048 candidate small RNA clones with perfect matches in the 
Drosophila melanogaster genome, which represent 13,299 unique 
Piwi-associated small RNAs. Of these, about 8.7% match known 
non-coding RNAs (Supplementary Fig. 1b). The remaining 12,903 
small RNAs are Piwi-associated piRNAs, which show a gaussian 
distribution in size and have a peak at 24—26 nt (Fig. 1b). Of these, 
55.2% contain U as the first 5’ nucleotide, a bias similar to that in 
mammalian piRNAs, whereas the second 5’ nucleotide shows a 
strong bias against U (Fig. 1c). A total of 10,792 piRNAs can be 
mapped to the assembled genome (Fig. 1d and Supplementary Figs 
1c—4). Of these, 7,651 (59.3%) are mapped to transposons (9.7% of 
the assembled genome), especially LTR (long terminal repeat) and 
LINE (long interspersed nuclear element) types of retrotransposon 
(Fig. 1d). In contrast, Piwi-associated piRNAs are underrepresented 
in gene-coding sequences, intergenic regions, and simple or low- 
complexity repeats. Along chromosomes, Piwi-associated piRNAs 
are highly enriched in pericentromeric regions and subtelomeric 
regions, which contain a high density of transposons (Supplemen- 
tary Figs 1c—4). These results are consistent with recent findings that 
Piwi subfamily proteins bind to transposon-derived piRNAs**'” and 
echo the role of Piwi in the epigenetic regulation of transposons and 
tandem transgenes’ *’®. 

To investigate the potential function of Piwi and its interacting 
piRNAs in epigenetic regulation, we focused on the 20-nt 3R-TAS1 
piRNA, which is uniquely mapped to 3R-TAS (Fig. 2a). We chose this 
piRNA for five reasons: first, TAS is genetically well characterized 
heterochromatin crucial for telomere function and genome integ- 
rity’’*°; second, TASs share structural similarity between eukaryotes, 
implicating their functional conservation”'; third, 3R-TAS is com- 
posed of tandem repeats highly homologous to LTR regions of 
the Invader4 retrotransposon (Supplementary Fig. 5), providing an 
opportunity to study the role of retrotransposon-originated piRNAs; 
fourth, 3R-TAS1 piRNA is uniquely mapped to 3R-TAS (Supple- 
mentary Fig. 5), which allows us to establish a one-to-one functional 
relationship between a specific piRNA and its genomic sequence; and 
last, we previously identified a P-element insertional mutation, 
Plw* ,ry* }A4-4, inserted 128 base pairs (bp) downstream of the 
3R-TASI piRNA coding sequence (Fig. 2a), as the strongest 
suppressor of piwi for its germline stem-cell phenotype”. One copy 
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of the P{w* ,ry* }A4-4 allele restores germline stem cells in about 70% 
of the homozygous piwi’ mutant females”. This suggests an import- 
ant role for 3R-TAS in germline stem-cell maintenance that probably 
resides in niche cells where Piwi functions’. 

We first used electrophoretic mobility-shift assays to confirm that 
Piwi binds directly to 3R-TAS1 piRNA (Supplementary Results and 
Supplementary Fig. 6). We then examined 3R-TAS1 piRNA expres- 
sion, which was detected in wild-type cells in both the germline and 
soma and was enriched in nuclei, suggesting its nuclear function 
(Fig. 2b, c). However, it is not detectable in piwi mutants, suggesting 
that Piwi is required for its transcription and/or its processing/ 
stability. 

To distinguish between these two possibilities, we first examined 
whether P{w" ,ry* }A4-4 affects 3R-TAS1 piRNA expression. We rea- 
soned that Pw? ,ry* }A4-4, being a 20-kilobase (kb) euchromatic 
sequence inserted at the heterochromatic 3R-TAS1 piRNA locus, is 
more likely to affect the local chromatin state and transcription of 
3R-TAS1 than its processing and/or stability. As expected, this 
piRNA is significantly overexpressed in P{w' ry’ }A4-4 flies (Fig. 
2c, d). In precise excision revertants, its expression level is reduced 
to that in the wild type (Supplementary Fig. 7a), indicating that the 
P-element insertion is the cause of its overexpression. Moreover, 
P{w* ry" }A4-4 rescues its expression in piwi mutants (Fig. 2d). 
These observations favour a role for Piwi in regulating the transcrip- 
tion of its precursor. The function of Piwi in its processing or 
stability, if any, must be redundant and thus can be replaced by 
Ago3 and/or Aubergine in piwi mutants. 
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P{w* ry" }A4-4 rescues both 3R-TAS1 piRNA expression and 
germline stem-cell function in piwi mutants, suggesting the involve- 
ment of the 3R-TAS1 piRNA and potentially other piRNAs from 
3R-TAS for germline stem-cell maintenance. To assess the range of 
the P{w" ,ry" }A4-4 effect in 3R-TAS, we examined the expression of 
two piRNAs closest to P{w*,ry* }A4-4 and 3R-TAS1 piRNA: the 
TAS2 piRNA on the centromeric side, only 54bp from P{w" ry" } 
A4-4, and the HeT-A1 piRNA mapped to the HeT-A retrotransposon 
on the telomeric side. Because neither piRNA is unique to 3R-TAS, 
our nuclease protection assays reflect the effects of P{w* ,ry’ }A4-4on 
the total cellular levels of these piRNAs; only the total cellular level is 
functionally relevant. The TAS2 and HeT-A1 piRNAs are expressed 
at similar levels in the wild-type and piwi mutant flies, either with or 
without P{w* ,ry* }A4-4 (Supplementary Fig. 7b, c). This suggests 
that, even if Piw* ry* }A4-4 has a cis effect on the transcription of 
these piRNAs from 3R-TAS, this effect is insignificant on their 
cellular levels and therefore should not have a functional impact. 
3R-TASI piRNA therefore provides a unique opportunity to assess 
the cis effect of P{w* ,ry* }A4-4 on 3R-TAS and its effect on germline 
stem-cell maintenance. 

To further confirm the effect of P-element insertion on the 
expression of its nearby piRNA, we examined the expression of the 
2R-42AB-B1 piRNA uniquely mapped to the 42AB region (piRNA 
cluster 17; Supplementary Table 1 and Supplementary Fig. 7d). Like 
that of 3R-TAS1 piRNA, the expression of the 2R-42AB-B1 piRNA is 
markedly decreased in piwi mutants. However, a P-element insertion 
170bp downstream of the 2R-42AB-Bl piRNA-coding region 
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Figure 3 | Piwi protein is associated with 3R-TAS and is required for its 
euchromatic state. a, ChIP by anti-Piwi antibody or preimmune serum 
(preim.) in wild-type and piwi mutant followed by quantitative PCR reveals 
that Piwi is strongly associated with the 73-bp TAS(D) region. The relative 
enrichment is calculated by normalizing the quantity of 3R-TAS genomic 
DNA against the quantity of Rp49 genomic DNA. Grey columns, ry”; black 
columns, piwi?/piwi’;ry’”*. b, ChIP by anti-Myc antibody in wild-type and 
myc—piwi transgenic flies shows that Myc—Piwi is specifically associated with 
the TAS(D) region but not with three regions 2-10 kb near (N) the 
representative piRNA clusters (no. 9, no. 11 and no. 16), or non-telomeric 
HeT-A and TART homologous regions on chromosome 4, or even TAS(P). 
The relative enrichment is the ratio of the enrichment in myc—piwi flies to 
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that in w!?”° flies. ¢, ChIP reveals that TAS(D) has both euchromatic and 
heterochromatic histone modifications. Relative enrichment was calculated 
by normalizing the quantity of TAS(D) DNA co-precipitated by various 
antibodies against that of control without antibody. n.c., negative control, 
containing no DNA, for quantitative PCR. d, Association of modified 
histones H3K4me2, H3K4me3, H3K9ac, H3K9me2, H3K9me3 and HP1 
with TAS(D) was assayed by ChIP and quantitative PCR. The relative 
enrichment of modified histones was calculated by normalizing the quantity 
of TAS(D) DNA against the quantity of Actin5C. Grey columns, ry’”’; black 
columns, piwi?/piwi* ry” . hatched columns, piwi’/piw?’; 208 Piw* ry? } 
A4-4, Each individual experiment was repeated at least three times. Error 
bars indicate s.d. 


©2007 Nature Publishing Group 


NATURE|Vol 450|8 November 2007 


decreases rather than increases the expression of this piRNA. Thus, 
P-element insertions at different sites exert divergent effects on the 
expression of their nearby piRNAs. 

We then investigated whether the positive role of Piwi in regulat- 
ing 3R-TAS1 piRNA expression is due to its direct effect on the 
epigenetic state of 3R-TAS. We first examined whether Piwi directly 
localizes to 3R-TAS(D), a 73-bp region spanning the 3R-TAS1 
piRNA-coding sequence (D here stands for distal; Fig. 2a), by Piwi- 
chromatin immunoprecipitation (ChIP). Piwi is associated with the 
TAS repeats at a level 46-fold that of the housekeeping gene Rp49 
(Fig. 3a). This specific association is further confirmed by Myc—Piwi 
ChIP with an anti-Myc antibody. Myc—Piwi is enriched in TAS(D) 
16.6-fold over an intergenic region on chromosome 2 (Fig. 3b), and 
11.4-fold and 11.5-fold over the succinate dehydrogenase B and 
actin88F genes, respectively (Supplementary Fig. 8). In contrast, 
Piwi is not associated with any of the five piRNA-poor genomic 
regions examined, or even with a proximal 3R-TAS sequence 
(TAS(P)) only 387 bp away (Figs 2a and 3b, and Supplementary 
Fig. 8). Piwi is therefore strongly associated with TAS(D), which 
transcribes the 3R-TAS1 piRNA. 

We then characterized the epigenetic states of 3R-TAS(D) by 
ChIP. In wild-type flies, 3R-TAS(D) is associated with both euchro- 
matic modification markers (H3K4me2, H3K4me3 and H3K9ac, the 
last of these being a transcriptional marker) and heterochromatic 
modification markers (H3K9me2, H3K9me3, H4K12ac and HP1), 
suggesting that 3R-TAS(D) might be under dynamic equilibrium 
between euchromatic transcription and heterochromatic silencing 
(Fig. 3c). Although Piwi has a global function in heterochromatic 
silencing (Supplementary Results and Supplementary Fig. 9), it 
has the opposite effect on 3R-TAS(D). In piwi mutants, TAS(D)- 
associated H3K9ac, H3K4me2 and H3K4me3 levels are decreased 
4.5-fold, 3.3-fold and 7.0-fold, respectively (Fig. 3c). In contrast, 
TAS(D)-associated HP1, H3K9me2 and H3K9me3 are enriched 
2.2-fold, 1.3-fold and 1.5-fold, respectively. Similarly, in the 172-bp 
repeat region, levels of the three euchromatic markers were decreased 
more than in the 73-bp region, HP1 was enriched 10.3-fold, even 
though H3K9me2 and H3K9me3 levels were decreased by 31% and 
41%, respectively (Supplementary Fig. 10). These histone modifica- 
tion profiles are consistent with our finding that 3R-TAS1 piRNA is 
expressed in the wild-type fly but not in piwi mutants, indicating that 
Piwi promotes the euchromatic feature of TAS(D) chromatin. 

To further test the positive epigenetic role of Piwi towards TAS(D), 
we examined the histone modification profile of this region in 
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piwi'/piwi’;P{w" ,ry* }A4-4/+ flies, in which 3R-TAS1 piRNA 
expression is restored. Indeed, we found that the euchromatic feature 
of TAS(D) is significantly restored (Fig. 3d). In comparison with 
the piwi’ mutant, the TAS-associated H3K9ac in piwi?3P{w" ry" } 
A4-4 flies is increased 3.1-fold, reaching about 70% of the wild- 
type level. Correspondingly, TAS-associated HP1, H3K9me2 and 
H3K9me3 levels are decreased to those of the wild type. This signifi- 
cant restoring effect of P{w* ,ry* }A4-4 is unlikely to be due to other 
cryptic effects of P{w* ry" }A4-4 itself, because the same sequence 
inserted into other genomic sites does not suppress the piwi mutant 
phenotype”. Instead, it is likely that the insertion of P{w* ,ry* }A4-4, 
as a 20-kb unique sequence, may affect heterochromatization in 
3R-TAS. P{w*,ry* }A4-4 therefore rescues the germline stem-cell 
phenotype of the piwi mutant by restoring the euchromatic feature 
of 3R-TAS and the transcription of 3R-TAS1 piRNA. 

The positive epigenetic role of Piwi in TAS(D) is also demon- 
strated by the expression of a reporter gene, white, in P{w" ry" } 
A4-4 insertion. This white gene exhibits a typical telomere position 
effect (Fig. 4), suggesting that TAS is under heterochromatic influ- 
ence’’. Expectedly, loss of piwi function enhances telomere position 
effect in a dosage-sensitive manner, opposite to the known suppres- 
sion effects of Polycomb group proteins (Fig. 4a)”*5. In P{w" ry" } 
A4-4 stocks, the white gene was variably expressed in different omma- 
tidia in piwi’ /piwi* flies, with an overall orange eye colour with 
spotty red ommatidia in the posterior region. In piwi’ /piwi’ flies, 
the eye colour becomes lighter. In piwi’/piwi’ flies, the eye colour 
turns completely white in most ommatidia, leaving the posterior red 
ommatidia apparently unaffected by the piwi dosage. This indicates 
that piwi is required for the expression of white inserted in 3R-TAS. 
Eye pigment assay shows that white is expressed in piwi” homozygotes 
at a level 5.0-7.8-fold lower than in piwi? heterozygotes (Fig. 4b). 
Because piwi does not affect the expression level of the endogenous 
white gene on the X chromosome (Fig. 4b), piwi is probably required 
for the expression of the white reporter gene by promoting the active 
epigenetic state of 3R-TAS. 

The above findings reveal the complexity of small RNA-mediated 
epigenetic regulation, namely that Piwi can exert opposite effects on 
different genomic regions. piRNAs may have a function in guiding 
Piwi to the target sites, yet the opposite effects of Piwi at different 
target sites might be mediated by the local chromatin context, which 
would render the selective binding of the Piwi-piRNA complex to 
different partners such as HP 1a or JmJC domain-containing histone 
demethylases**. The activating effect of Piwi, 3R-TAS1 piRNA 
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and possibly other piRNAs in 3R-TAS can be explained by a hetero- 
chromatin/euchromatin counterbalance model in which the repet- 
itive nature of 3R-TAS by default is a substrate for heterochro- 
matization. The heterochromatic state could be established and 
maintained by the Polycomb group proteins or an RNAi pathway 
mediated by Ago proteins, or yet another mechanism. However, 
the association of Piwi with 3R-TAS by means of the 3R-TAS1 
piRNA or P{w*,ry”}A4-4 insertion as a roughly 20-kb unique 
sequence counteracts the heterochromatization (Supplementary 
Discussion and Supplementary Fig. 11). Our finding that Piwi is 
required for the epigenetic activation of the subtelomeric region 
starts to reveal a mechanism underlying epigenetic regulation and 
stem-cell maintenance. 


METHODS SUMMARY 

piRNA cloning, mapping and annotation. Myc—Piwi ribonucleoprotein com- 
plexes were immunoprecipitated from adult ovarian tissues with monoclonal 
anti-Myc antibody (9E10). Total RNA was prepared by using TRIzol. Small 
RNAs were gel-purified, cloned and sequenced as described*’. Cloned small 
RNAs were mapped to the D. melanogaster genome assembly, version 5.1. 
Functional annotation was performed by in-house Perl scripts aiding BioPerl 
modules and Ensembl API. 

RNase protection assay. PCR-amplified template DNAs were cloned into the 
pGEM-T vector (Promega). High-specific-activity probes were generated by 
in vitro transcription with the MaxiScript T7/Sp6 kit (Ambion) and 
[a-?*P]UTP. TRIzol-extracted RNA (20 ig) was hybridized overnight with 
2 X 10°c.p.m. of radioactive probe at 42°C. Unpaired RNA was digested by 
an RNaseA/RNaseT1 mixture. 

Chromatin immunoprecipitation and quantitative PCR. Nuclei from adult 
flies were isolated and crosslinked with 0.1% formaldehyde. Chromatin was 
fragmented by sonication in RIPA buffer. Antibodies were incubated overnight 
with nuclear extracts at 4°C. Bead-bound DNA was eluted, reverse-crosslinked 
and precipitated. Quantitative PCR was conducted on a Roche LightCycler 2.0 
system with the LightCycler DNA Master SYBR Green I. Sequences of primers 
are provided in Methods. Normalized enrichment values were calculated with a 
standard formula. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Drosophila stocks and cultures. All fly stocks were maintained at 20 °C. For the 
immunoprecipitation assay, a Myc—Piwi transgenic strain (G38- a) and w'1/8 
flies were used. For the RNase protection assay, w'77°, w!"8:piwi' /piwi!, w'"”8; 
piwi/piwi, w'""®; sPiw" sty’ }A4-4,ry°(221|516| R86- ARB 2#3) and will8, 
piwi'/piwi’;P{w* ,ry* }A4-4,r °° (221|516|R86- 2|R86-2#3) flies were used. For 
the ChIP assay, w!"’8sry°, w!’:piwi7/piwi sry’, w'"®piwi'/piwi;P{w" ry" } 
A4-4(221), ry") +,ry°° and G38-1A flies were used. For global analysis of his- 
tone PTMs, w'""8 and w!""*;piwi’/piwi’ flies were used. For the telomere position 
effect assay and the fly eye pigmentation assay, Canton-S, piwi’/piwi’, 
w!!8 piwi?/piwi, w'"'®;Sco/CyO;P{w" ry’ }A4-4,ry""° (221|516|R86-2| R86-2#3), 
w''8-piwi?/CyO;P{w" ry" }A4-4,ry"°(221|516| R86-2| R86-2#3) and w'""*;piwi?/ 
piwi?;P{w* ry" }A4-4,ry° (221|516| R86-2| R86-2#3) flies were used. 
Immunoprecipitation assay and cloning of piRNAs. Adult ovaries (100 pairs) 
were homogenized in an equal volume of ovary lysis buffer (20 mM HEPES 
pH7.5, 100mM KCl, 5mM MgCh, 0.1% SDS, 0.1% sodium deoxycholate, 
1% Triton X-100, 1mM dithiothreitol, 0.2mM phenylmethylsulphonyl flu- 
oride, 1 X Complete Mini, EDTA free Proteinase Inhibitor cocktail (Roche), 
0.5 unit pl’ RNAaseOUT (Invitrogen), 5% glycerol). Anti-Myc monoclonal 
antibody (9E10; Developmental Studies Hybridoma Bank at the University of 
Iowa) was added at 1:10 dilution to the precleared ovary lysate and incubated for 
2h at 4°C. Protein G-Sepharose 4B beads were added and incubated for a 
further 1h at 4°C. Bead-bound RNAs were extracted with TRIzol (Invitrogen) 
and precipitated in the presence of 50 ,1gml_' GlycoBlue (Ambion). The co- 
precipitated RNA was 5’ labelled with [y-*’P]ATP by T4 polynucleotide kinase 
(NEB), purified on a Sephadex G-25 fine RNA spin column (Roche) and sepa- 
rated by 15% denaturing PAGE for detection. A PCR-amplified small RNA 
library (1 ,tg) was directly sequenced with a large-scale pyrosequencing method 
(454 Life Sciences). 

Genome mapping and annotation of cloned small RNAs. Cloned small RNAs 
were mapped to both the D. melanogaster genome assembly, version 5.1, and 
annotated sequence databases to infer the likely genome origins. The mapping 
was performed with a standalone NCBI BLAST program (http://www.ncbi.nlm. 
nih.gov/BLAST/download.shtml), with sensitive parameters to identify perfect 
matches of short query sequences (no mismatch, insertion or deletion is 
allowed). In-house Perl scripts aiding BioPerl modules (http://www.bioperl.org) 
and Ensembl API (http://www.ensembl.org/info/software/core/core_tutorial. 
html) were used to automate the mapping, annotation and analysis procedure. 
A home-made non-coding RNA database was built by combining the available 
databases with batch-fetched entries with appropriate feature keys from 
Ensembl. The sequences and annotations of transposons and repetitive 
sequences in Drosophila melanogaster genome assembly, version 5.1, were 
identified by a standalone RepeatMasker program with a sensitive filter 
(http://www.repeatmasker.org). 

A piRNA was annotated to a specific type of genomic sequence only if the 
sequence covered the full length of the piRNA. A piRNA cluster was defined as a 
genomic region containing more than 100 piRNA mapped sequences and not 
having any piRNA mapped sequence in the upstream 5-kb region or the down- 
stream 5-kb region. The borders of a piRNA cluster were defined by the first 
nucleotide of the farthest upstream piRNA and the last nucleotide of the farthest 
downstream piRNA. 

Chromatin immunoprecipitation and quantitative PCR. Nuclear extract was 
incubated at 4°C for 2h separately with the following antibodies: anti-Piwi 
(generated in the Lin laboratory against Piwi C-terminal peptide), 1:100 dilu- 
tion; preimmune of anti-Piwi, 1:10 dilution; anti-Myc monoclonal antibody 
(9E10; Developmental Studies Hybridoma Bank, University of Iowa), 1:10 dilu- 
tion; anti-dimethyl-histone3 K4 (Upstate), 1:100 dilution; anti-trimethyl- 
histone3 K4 (Upstate), 1:100 dilution; anti-dimethyl-histone3 K9 (Upstate), 
1:50 dilution; anti-trimethyl-histone3 K9 (Upstate), 1:50 dilution; anti-acetyl- 
histone3 K9 (Upstate), 1:100 dilution; anti-acetyl-histone4 K12 (Upstate), 1:100 
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dilution; anti-dimethyl-histone3 K27 (Upstate), 1:50; and anti-HP1 (Covance), 
1:50 dilution. Salmon-sperm DNA (50 ul)/Protein A/G-agarose beads (Upstate) 
was added and samples were incubated at 4°C for 1h. After washing, cross- 
linking was reversed at 65 °C overnight. DNA was purified from the beads by 
phenol/chloroform extraction and ethanol precipitation. 

The following primer sets were used for quantitative PCR: for the TAS(D) 

region, 5’-gtgtctcatccatttcctttattcag-3’ (forward) and 5’-tggtcgtgttgatcggtacttg- 
3’ (reverse); for a 169-bp fragment of the 172-bp repeat region, 5'-gatcttctta- 
catttcccttcttcaac-3’ (forward) and 5’-cggcagaggcacgaacaac-3’ (reverse); for a 
345-bp region in the proximal end of the 3R-TAS repeat, 5’-caacccaatcggacct- 
cactt-3’ (forward) and 5'-gtgacgattaatacgaaaacttacaaac-3’ (reverse); for an 81- 
bp region about 2 kb upstream of the piRNA cluster no. 9, 5’-aaatgcagcagg- 
cagcgcgaa-3' (forward) and 5’-cctcaatatgtagagtagtgcgagtgactt-3' (reverse); for 
an 86-bp region about 8 kb downstream of the piRNA cluster no. 11, 5'-gctcaa- 
gagtcctccagacagett-3' (forward) and 5'-gcagtgatggtgstggcagtt-3’ (reverse); for a 
128-bp region about 10 kb downstream of the piRNA cluster no. 16, 5’-agggt- 
tatgctaggttcttatgctgce-3’ (forward) and 5’-ggaaacgaataaacaaatgggtcaaca-3' 
(reverse); for a 164-bp non-telomeric HeT-A homologous region on chro- 
mosome 4, 5/-agtcgatgttaaatccattccg-3’ (forward) and _ 5’-tgggttacttgtcc- 
tatgtgcc-3’ (reverse); for 129-bp non-telomeric TART homologous region on 
chromosome 4, 5’-tcaacatacgcagacgacact-3' (forward) and 5'-agcatttactgaga- 
taccccatt-3’ (reverse); for a 97-bp region in the Rp49 gene, 5’-tcgagttgaactgcgt- 
tagtccgt-3’ (forward) and 5’-gccatttggcgaactttcacagga-3’ (reverse); for a 224-bp 
intergenic region [2R:11,327,115..11,327,338], 5’-ttgggaggcagggaaaggatg-3' 
(forward) and 5’-aaggaggagaatgetaaggagatggat-3’ (reverse); for a 165-bp region 
in the Succinate Dehydrogenase B gene (SDHB), 5’-ccaggaaccaggtgagtggagtgc-3' 
(forward) and 5’-gcgtgcttatctgtggcetttctatc-3’ (reverse); for a 187-bp region in 
the Actin88F gene, 5'-aagctcttcaaaggcagcaaccag-3’ (forward) and 5’-aaatggccat- 
gaaggatgagcacc-3' (reverse); for 334-bp region in the Actin5C gene, 5'- 
tgcccgacggacaggtgat-3’ (forward) and 5'-tggaaggtggacagcgaagc-3' (reverse). 
Using a standard formula, the quantities of target genomic regions precipitated 
by different antibodies were normalized against those of Rp49, the 2R intergenic 
region, Actin5C, SDHB and Actin88F. This ratio was further normalized against 
those from input DNA (without ChIP). The relative quantities of the TAS(D) 
region precipitated by different antibodies in Fig. 3c were normalized against the 
quantity of TAS(D) region precipitated by naked beads without any antibody. 
The difference in amplification efficiency between the various primer sets was 
determined by standard curves and was taken into account in the normalization 
calculation. The average values and standard deviations of the relative enrich- 
ments were used to compare the association of Piwi in different genomic regions 
and the histone modification profiles of the TAS region in different genotypes. 
Drosophila eye pigmentation assay. Fifty 5-8-day-old fly heads were homoge- 
nized in 400 pl of acidified methanol (0.1% HCl in methanol) and vortex-mixed 
for 30 min at room temperature. After centrifugation for 5 min, 20 ul of 0.5% 
H20, was added to the supernatant. After centrifugation for 10 min, the absor- 
bance at 480 nm was measured. Samples for each genotype were repeated three 
times. The average absorbance and standard deviations were used to quantify eye 
colours. 
Electrophoretic mobility-shift assay. Piwi full-length complementary DNA 
with a 5’ Myc tag sequence was cloned into the pBlueScript KS vector 
(Stratagene). Myc—Piwi protein was translated in vitro with a TNT T7 
Coupled Wheat Germ Extract System (Promega) and standard procedures. 
Mock in vitro translation was performed without pBS-KS:myc—piwi DNA tem- 
plate. Increasing amounts of concentrated Myc—Piwi and mock in vitro trans- 
lation reactions were incubated at room temperature for 30min with 5 fmol 
(about 10* c.p.m.) of 5’ end-labelled 3R-TAS1 piRNA probe or 3R-TAS1 
piRNA probe with the 5’ U replaced with C, in the presence or absence of 
different amounts of unlabelled RNA oligonucleotides (10 and 50 fmol). The 
binding was done in 1 X binding buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 
2mM MgCl, 1 mM dithiothreitol, 10% glycerol, 15 nM yeast tRNA). 
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JHDM1B/FBXL10 is a nucleolar protein that represses 
transcription of ribosomal RNA genes 


David Frescas’, Daniele Guardavaccaro’, Florian Bassermann’, Ryo Koyama-Nasu' & Michele Pagano’ 


JHDMI1B is an evolutionarily conserved and ubiquitously 
expressed member of the JHDM (JmjC-domain-containing his- 
tone demethylase) family'*. Because it contains an F-box motif, 
this protein is also known as FBXL10 (ref. 4). With the use of 
a genome-wide RNAi screen, the JHDMI1B worm orthologue 
(T26A5.5) was identified as a gene that regulates growth’. In the 
mouse, four independent screens have identified JHDMI1B as a 
putative tumour suppressor by retroviral insertion analysis®°’. 
Here we identify human JHDM1B as a nucleolar protein and 
show that JHDMIB preferentially binds the transcribed region 
of ribosomal DNA to repress the transcription of ribosomal 
RNA genes. We also show that repression of ribosomal RNA genes 
by JHDM1B is dependent on its JmjC domain, which is necessary 
for the specific demethylation of trimethylated lysine 4 on histone 
H3 in the nucleolus. In agreement with the notion that ribosomal 
RNA synthesis and cell growth are coupled processes, we show a 
JmjC-domain-dependent negative effect of JHDM1B on cell size 
and cell proliferation. Because aberrant ribosome biogenesis and 
the disruption of epigenetic control mechanisms contribute to 
cellular transformation, these results, together with the low levels 
of JHDM1B expression found in aggressive brain tumours, suggest 
a role for JHDM1B in cancer development. 

To begin assessing the biological function of JHDM1B, we inves- 
tigated its cellular localization by using indirect immunofluorescence 
analysis. The results presented in Fig. la and Supplementary Fig. la 
show the localization of endogenous JHDM1B to the nucleolus of 
human cell lines (HeLa and T98G) and non-immortalized, non- 
transformed fibroblasts (IMR90). The nucleolar localization was 
confirmed by simultaneous staining with the nucleolar protein 
nucleophosmin/B23. Knockdown of JHDM1B by short interfering 
RNA (siRNA) eliminated this nucleolar signal, demonstrating the 
specificity of the immunostaining (Fig. la). JHDM1B has a putative 
and evolutionarily conserved nucleolar localization signal (NoLS) 
(Supplementary Fig. 1b). Comparison between wild-type JHDM1B 
and a mutant lacking the NoLS (JHDMIB(NOoLS)) revealed the 
requirement of this basic amino-acid motif in the targeting of 
JHDMI1B to the nucleolus (Fig. 1b). Disruption of other domains 
by point mutation or deletion failed to disrupt the localization of 
JHDM1B to the nucleolar compartment (Supplementary Fig. 2). 
Last, green fluorescent protein (GFP)-tagged JHDM1B localized to 
the nucleolus of living HeLa cells (Supplementary Fig. 3). Taken 
together, these data show that JHDM1B is a nucleolar protein, and 
its subcellular localization is dependent on the presence of a NoLS 
motif. 

The nucleolar localization of JHDM1B, together with the presence 
of a CXXC zinc-finger DNA-binding domain, suggested that this 
protein might bind ribosomal DNA. During mitosis, (DNA-binding 
proteins, such as RNA polymerase I (pol I) and the pol I transcription 
factor UBF (upstream binding factor), remain associated with 


chromosomes and display discrete foci at nucleolar organizing 
regions (NORs)", which mark transcriptionally competent nuc- 
leation points for nucleoli from the previous interphase. JHDM1B 
localized together with UBF at NORs in mitotic cells (Supplementary 
Figs 4 and 5). Moreover, JHDM1B continued to localize to the 
nucleolar compartment after pretreatment with Triton X-100 
(Supplementary Fig. 6), similar to UBF’, suggesting stable asso- 
ciation with rDNA. 

To confirm the binding of JHDM1B to rDNA and to obtain a 
high-resolution map for this binding, we performed chromatin 
immunoprecipitation (ChIP) assays followed by quantitative real- 
time polymerase chain reaction (PCR) using primer pair sets that 
span the entire human rDNA repeat’*. Mapping of JHDM1B binding 
throughout the rDNA locus showed that JHDM1B bound mainly to 
the transcribed region of rDNA, with particular enrichment at 8 
kilobases (kb) (Fig. 1c—e). The specificity of the binding to rDNA 
was confirmed by the lack of JHDM1B enrichment on the glyceral- 
dehyde-3-phosphate dehydrogenase (GAPDH) promoter and by the 
fact that a DNA-binding mutant, JHDM1B(CXXC), failed to bind to 
the 8-kb region (Fig. le, f). CXXC zinc-finger domains specifically 
recognize and bind unmethylated CpG-rich regions’*. This binding 
was confirmed in vitro for JHDM1B"™. CpG-island prediction by the 
bioinformatics program cpgplot (http://www.ebi.ac.uk/emboss/ 
cpgplot) and previously reported biochemical studies'® show enrich- 
ment of CpG-rich regions in rDNA over the transcribed region of 
rDNA (Supplementary Fig. 7), where JHDM1B binds. Thus, 
JHDM1B binds across the transcriptionally competent region of 
the rDNA repeat, with particular affinity for regions containing ele- 
vated CpG frequency. 

To determine whether JHDM1B has a function in regulating 
rDNA transcription in vivo, we measured the levels of the 45S pre- 
ribosomal RNA by quantitative reverse transcriptase-mediated PCR 
(qRT-PCR). Silencing of JHDM 1B by siRNA resulted in a significant 
increase in the expression of pre-rRNA compared with control 
siRNA (Fig. 2a), suggesting that JHDM1B is a repressor of rDNA 
transcription. Accordingly, we observed that cells expressing exogen- 
ous JHDM1B displayed a marked decrease in pre-rRNA synthesis 
compared with mock-transfected cells (Fig. 2b). 

To investigate the domains required for JHDM1B-mediated 
repression of rRNA in vivo, we measured pre-rRNA levels in cells 
expressing JHDM1B mutants. Loss of the PHD finger or the F-box 
domain resulted in continued repression by JHDM1B, whereas loss 
of the NoLS or the CXXC zinc-finger blocked this transcriptional 
repression (Fig. 2b). We also found that the JmjC histone demethy- 
lase domain was required for the transcriptional repression of rDNA 
by JHDM1B (Fig. 2b). Moreover, a significant portion of cells expres- 
sing JHDM1B also failed to incorporate BrUTP within the nucleolus 
in a JmjC-domain-dependent manner (Fig. 2c). Taken together, 
these data show that JHDM1B represses rDNA transcription. This 
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repression requires the NoLS motif (to allow nucleolar localization) 
and the CXXC zinc-finger domain (to bind DNA). In addition, the 
repressive activity of JHDM1B required its JmjC domain. 

To investigate the requirement for the JmjC domain, we addressed 
previous studies showing that JHDM1A demethylates dimethylated 
lysine 36 on histone 3 (H3K36me2) by means of this motif’. We 
conducted an immunofluorescence analysis in HeLa cells transfected 
with constructs encoding JHDM1A or JHDMIB by using an anti- 
body against H3K36me2, and we found that forced expression of 
JHDMIA significantly decreased the levels of H3K36me2, as reported 
previously’, whereas JHDM 1B had no effect (Supplementary Fig. 8). 
JHDMIB was also unable to produce any significant changes in 
the methylation status of H3K36me3, H3K9me3, H3K27me3 or 
H3K4me2 (Supplementary Fig. 9). In contrast, JHDM1B overexpres- 
sion resulted in a significant decrease in H3K4me3 levels, and this 
decrease required the presence of the JmjC domain (Fig. 3a, b). The 
specific reduction of H3K4me3 levels by JHDM1B was confirmed by 
immunoblotting (Fig. 3c, d). Last, incubation of core histones with 
immunopurified JHDM1B confirmed in vitro the ability of JHDM1B 
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to demethylate H3K4me3 (Supplementary Fig. 10). Taken together, 
these results demonstrate that JHDM 1B is a histone demethylase that 
catalyses the demethylation of H3K4me3. This finding is significant 
because active rRNA genes are reported to display active, euchro- 
matic features that include H3K4me3 (refs 16, 17). 

Because the decrease in global H3K4me3 levels was probably due to 
leakage of exogenous JHDM1B into the nucleoplasm, we investigated 
whether local changes in H3K4 trimethylation levels occur on rDNA by 
using an anti-H3K4me3 antibody in ChIP analysis. Downregulation of 
JHDM1B resulted in an increase in H3K4me3 on rDNA, particularly in 
the promoter region (42.9 kb) (Fig. 3e). Accordingly, we found a sig- 
nificant decrease in the levels of H3K4me3 at the rDNA locus in cells 
ectopically expressing JHDM 1B (Fig. 3f). 

Interestingly, ectopic expression of JHDM1B decreased the occu- 
pancy of UBF at rDNA regions previously reported to be enriched for 
the protein” (Fig. 3g). Similarly, JHDM1B expression decreased the 
amount of chromatin-bound UBF in a JmjC-dependent manner, 
whereas depletion of JHDM 1B increased the binding of UBF to chro- 
matin (Supplementary Figs 11 and 12). 
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Figure 1| JHDM1B localizes to the nucleolus and associates with rDNA. 
a, Indirect immunofluorescence analysis of endogenous JHDM1B in HeLa 
cells treated with control siRNA oligonucleotides or oligonucleotides 
targeting JHDM1B mRNA. Cells were stained with antibodies against 
JHDM1B and nucleophosmin/B23 and DAPI (to visualize DNA), as 
indicated. b, Indirect immunofluorescence analysis of HeLa cells transfected 
with an empty vector (EV) or constructs encoding Flag-tagged JHDM1B or 
JHDM1B(NOLS), as indicated. Cells were stained with an anti-Flag antibody, 
an antibody against B23 and DAPI. ¢c, Schematic representation of a single 
human rDNA repeat. d, JHDM1B binds across the transcriptionally 
competent region of rDNA. Enrichment of rDNA obtained with anti-Flag 
antibody as determined by ChIP analysis using chromatin prepared from 
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HeLa cells transfected with Flag—EV (open columns) or a construct encoding 
Flag-tagged JHDM1B (filled columns). DNA binding was quantified by real- 
time PCR with primer sets at indicated regions along the rDNA. The value 
given for the amount of PCR product present in EV-transfected cells was set 
at 1. e, Analysis of endogenous JHDM1B enrichment on rDNA and the 
GAPDH promoter by ChIP from chromatin of HeLa cells. JHDM1B 
enrichment (filled columns) was quantified by real-time PCR with the 
indicated primer sets. The value given for the amount of PCR product 
present from ChIP with control IgG (anti-haemagglutinin; open columns) 
was set as 1. f, ChIP experiment for enrichment of rDNA at 8 kb (filled 
columns) and the GAPDH promoter (open columns) was performed as in 
d. All error bars represent s.d. (n = 3). 
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Because the synthesis of rRNAs and cell growth are coupled 
processes'®, we investigated whether JHDM 1B affected cell size and 
cell proliferation. Forward scatter analysis showed that HeLa cells 
exogenously expressing JHDM1B were significantly smaller than 
control or JHDM1B(JmjC)-expressing cells (Fig. 4a). This effect also 
required the NoLS and the CXXC zinc-finger domains but not the 
PHD domain (Supplementary Fig. 13). Cells expressing JHDM1B 
also incorporated about 40% less bromodeoxyuridine (BrdU) than 
control cells, reflecting a smaller population of cells in S phase 
(Fig. 4b). Conversely, cells in which JHDM1B was silenced were 
significantly larger than control cells and showed a 1.67-fold increase 
in BrdU incorporation (Fig. 4c ,d). Accordingly, JHDM1B-depleted 
cells proliferated at a faster rate than control HeLa cells (Fig. 4e). 
Taken together, these data suggest that JHDM1B-mediated modu- 
lation of rRNA gene expression influences cell growth and prolifera- 
tion. Interestingly, under conditions in which rRNA production has 
been shown to decrease, such as during serum deprivation'’, the 
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Figure 2 | JHDM1B represses transcription of rRNA genes via the JmjC 
domain. a, qRT-PCR analysis of pre-rRNA (filled columns) and JHDM1B 
mRNA (open columns) from HeLa cells transfected with siRNA targeting 
JHDM1B compared with control. The value given for the amount of pre- 
rRNA PCR product present in control cells was set as 1. The value given 
for the amount of JHDM1B PCR product present in control cells was set at 5. 
b, Top: qRT-PCR analysis of pre-rRNA in HeLa cells retrovirally infected 
with an empty vector (EV) or constructs encoding wild-type JHDM1B or the 
indicated mutants. The value given for the amount of pre-rRNA present in 
empty vector (EV) transfected cells was set as 1. Bottom: expression of 
JHDM1B proteins, as analysed by immunoblotting («-tubulin was used as a 
loading control). ¢, Graphical representation of BrUTP incorporation in the 
nucleolus of HeLa cells transfected with EV or constructs encoding JHDM1B 
or JHDM1B(JmjC). All error bars represent s.d. (1 = 3). 
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Figure 3 | JMDM1B demethylates H3K4me3 on the rDNA locus. a, Indirect 
immunofluorescence analysis of H3K4me3 levels in HeLa cells transfected 
with vectors encoding JHDM1B or JHDM1B(JmjC). Cells were stained with 
antibodies against Flag and H3K4me3, and with DAPI, as indicated. 

b, Quantification of three experiments performed as in a. The value given for 
H3K4me3 present in EV-transfected cells was set at 100%. Error bars 
represent s.d. (n = 3). ¢, Levels of methylated histone H3 in HeLa cells 
infected with an empty retrovirus (EV) or with retroviruses encoding 
JHDM1B or JHDMI1B(JmjC) and analysed by immunoblotting with 
antibodies against the indicated proteins. The bottom panel shows core 
histones stained with Ponceau red. d, Densitometric quantification of 
three experiments performed as in c. The value given for H3K4me3 
present in EV-transfected cells was set at 100%. Error bars represent s.d. 
(n = 3). e, HeLa cells were transfected with control siRNA (open columns) 
or JHDM1B siRNA (filled columns) oligonucleotides, and ChIPs were 
performed as in Fig. le. f, Analysis of H3K4me3 enrichment on rDNA, 
quantified as in e, using chromatin from 293T cells transfected with Flag-EV 
(open columns) or a construct encoding Flag-JHDM1B (filled columns). 
g, JHDM1B expression induces the dissociation of UBF from rDNA. 
Analysis of UBF enrichment on rDNA, quantified as in e, using chromatin 
from HeLa cells transfected with empty vector (open columns) or a 
construct encoding Flag-JHDM1B (filled columns). Error bars in 

e-g represent s.d. (m = 3). 
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amount of chromatin-bound JHDM 1B increased, whereas serum re- 
addition transiently abolished this binding (Supplementary Fig. 14), 
suggesting that the binding of JHDM1B to rDNA is regulated by 
mitogens. 

Actively proliferating cells transcribe high levels of rDNA and 
often show an increase in nucleolar number'’. Whereas about 
60% of HeLa cells possessed five or six nucleoli, cells ectopically 
expressing JHDM1B had a substantial nucleolar contraction, with 
most JHDM1B-positive cells possessing three or fewer nucleoli per 
cell (Fig. 4f). Consistent with the inability of JHDM1B(JmjC) and 
JHDM1B(NOoLS) to repress rRNA expression was the observation 
that these mutants were unable to induce nucleolar contraction 
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(Fig. 4f). Knockdown of JHDM 1B resulted in the generation of nuclei 
with disorganized nucleoli; the number of nucleoli in JHDMIB 
siRNA-treated cells exceeded seven per nucleus (Figs la and 4g, 
and Supplementary Fig. 15). 

A recent study has established a NOR-based index as a marker for 
grading and staging astrocytic lesions of the brain®’. The number of 
NORs was found to rise in parallel with the grade, stage and prolif- 
erative state of the tumour. To investigate a potential involvement of 
JHDM1B in human cancer, we searched the Oncomine and the NIH- 
GEO online databases for differential JHDM1B expression in normal 
versus tumorigenic tissues*’”*. Expression of JHDM1B, but not that 
of JHDMIA, was significantly decreased in the most common and 
the most aggressive of the primary brain tumours, glioblastoma 
multiform (GBM), relative to normal brain tissue (Fig. 4h and 
Supplementary Fig. 16). This decrease in JHDM1B expression was 
correlated with brain tumour type and tumour grade (Fig. 4i). Taken 
together, these results suggest that the decreased expression of 
JHDM1B might contribute to the increased cell growth/proliferation 
and the number of NORs in GBM tumours, possibly resulting from 
aberrant regulation of rRNA expression. 

Thus, we show here that JHDM1B localizes to the nucleolus and 
provides a fundamental function for silencing rRNA genes in a man- 
ner dependent on its JmjC domain. Our findings suggest that this 
activity involves the demethylation of H3K4me3 at the rDNA locus, 
although we cannot exclude additional nucleolar, non-histone 
targets. Ribosome biogenesis is a highly coordinated process that 
ensures proper cell growth and proliferation by supporting the syn- 
thesis of proteins. Deregulation of this process has been linked to 
multiple forms of human disease, including cancer. Our study pro- 
vides critical insight into the role of aberrant ribosome biogenesis 
and epigenetic mechanisms to cellular transformation. We propose 
JHDM1B as an addition to the list of tumour suppressors, such as p53 
and pRb, that negatively regulate rRNA gene expression’®. 


METHODS SUMMARY 

RNA interference. HeLa cells were transfected by using HiPerFect transfection 
reagent (Qiagen), in accordance with the manufacturer’s protocol, with either a 
single siRNA oligonucleotide (5’-AGGCAAGUUUAACCUCAUG-3’) ora pool of 
four siRNA oligonucleotides (5’-GCAAUAAGGUCACUGAUCAUU-3’, _5’- 
GACCUCAGCUGGACCAAUAUU-3’, 5'-GGGAGUCGAUGCUUAUUGAUU- 
3’ and 5'-CAGCAUAGACGGCUUCUCUUU:-3’) targeting human JHDM1B 
(Dharmacon). Direct comparison of the pool of oligonucleotides and the single 
oligonucleotide (which was not present in the pool) showed identical results. 


Figure 4 | JHDM1B inhibits cell growth and proliferation. a, Cell size was 
determined by fluorescence-activated cell sorting (FACS; forward scatter) in 
HeLa cells retrovirally infected with empty vector (blue) or constructs 
encoding wild-type JHDM1B (red) or JHDM1B(JmjC) (green), as indicated. 
b, Percentage of BrdU-positive HeLa cells transfected with EV or constructs 
encoding Flag-tagged JHDM1B or JHDMI1B(JmjC). ¢, Cell size was 
determined by FACS (forward scatter) in HeLa cells transfected with control 
siRNA (blue) or JHDM1B siRNA (red) oligonucleotides. d, Percentage of 
BrdU-positive HeLa cells transfected as indicated. e, Proliferation of HeLa 
cells (transfected with control siRNA (blue) or JHDM1B siRNA (red) 
oligonucleotides) over a five-day period. f, Analysis of the number of 
nucleoli (determined by immunofluorescence with anti-B23 antibody) in 
HeLa cells transfected with EV or with constructs encoding Flag-tagged 
JHDM1B or the indicated JHDM1B mutants. White columns, three or fewer 
nucleoli; grey columns, four nucleoli; black columns, five or six nucleoli. 
g, Experiment performed as in f except that siRNA oligonucleotides against 
JHDM1B were compared with the control. White columns, three or fewer 
nucleoli; grey columns, four nucleoli; black columns, five or six nucleoli; red 
column, seven or more nucleoli. h, Data from ref. 21 (provided by 
Oncomine) reanalysed to show expression levels of JHDM1B in normal 
brain (NB), glioblastomas multiform (GBM), astrocytic tumours (AT), 
oligodendrogliomas (OD) and anaplastic oligoastrocytomas (AO). i, Data 
from ref. 22 (provided by NIH-GEO) reanalysed to show expression levels of 
JHDM1B in grade III and grade IV brain tumours (AA, anaplastic 
astrocytoma; AM, anaplastic mixed oligoastrocytoma). P < 0.040; t = 1.770. 
Where present, error bars represent s.d. (n = 3). 
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Transcription analysis by qRT-PCR. RNA was extracted with the use of the 
RNeasy kit (Qiagen), and reverse transcriptions were performed as described 
previously’*. qRT-PCR analysis was performed in accordance with standard 
procedures with SYBR Green mix (Bio-Rad). Primer sequences used to detect 
pre-rRNA and the housekeeping gene ARPP PO were as reported'*. The primer 
pairs used for the quantification of pre-rRNA recognize the 5’ external tran- 
scribed spacer (5’ ETS), and because the 5’ ETS is processed rapidly during 
transcription, it accurately reflects the rate of RNA pol I transcription’’. The 
sequences used for JHDM 1B were 5’-GAGGAGAAGAAGAAGGTGAAG-3’ and 
5'-TTGATGGGCTGCTGGTTC-3’. 

Indirect immunofluorescence. Cells were plated and cultured on chambered 
glass tissue-culture slides (BD Falcon) with complete medium. For immuno- 
fluorescence, cells were washed in PBS, fixed and permeabilized in 100% meth- 
anol at —20°C for 10 min, and then incubated with the primary antibodies for 
1h at 25°C in 0.5% Tween 20 in PBS (0.5% TBST). Slides were washed three 
times in 0.5% TBST for 5 min and incubated with secondary antibodies, diluted 
1:1,000. 4',6-Diamidino-2-phenylindole (DAPI; Molecular Probes) was 
included to reveal nuclei. Slides were washed in PBS and subsequently mounted 
with Aqua Poly/Mount (Polysciences). Images were acquired with a Nikon 
Eclipse E800 fluorescence deconvolution microscope. For pre-extraction, cells 
were rinsed in PBS and incubated in 0.5% Triton X-100 for 5 min on ice followed 
by methanol fixation at —20°C, permeabilization, and the immunofluorescence 
procedure described above. Fluorescence quantification was determined with 
ImageJ software (NIH). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture. HeLa, T98G, 293T, GP-293 and IMR90 cells were cultured as 
described previously”>™*. 

Antibodies. A polyclonal antibody against JHDM1B was generated by immun- 
izing rabbits with a peptide containing amino-acid residues 800-1000 of human 
JHDM1B. Rabbit polyclonal antibodies were as follows: anti-Flag (F7425; 
Sigma), anti-haemagglutinin (anti-HA) (71-5500; Zymed), anti-di-methyl 
H3K36 (07-274; Upstate), anti-di-methyl H3K4 (07-030; Upstate), anti-tri- 
methyl H3K4 (05-745; Upstate), anti-tri-methyl H3K9 (07-523; Upstate), 
anti-tri-methyl H3K27 (07-449; Upstate), anti-tri-methyl H3K36 (ab9050; 
Abcam), anti-Alexa Fluor 568 (A11036; Molecular Probes) and anti-phospho- 
H3 (Ser 10) (06-570; Upstate). Mouse monoclonal antibodies were as follows: 
anti-M2 Flag (F3165; Sigma), anti-nucleophosmin (32-5200; Zymed), anti- UBF 
(sc-13125; Santa Cruz Biotechnology), anti-o-tubulin (32-2500; Zymed), anti- 
JHDM1B (H00084678-MO09; Abnova), anti-GFP (A-11121; Molecular Probes), 
anti-BrdU (347580 (7580); BD) and anti-Alexa Fluor 488 (A21121; Molecular 
Probes). 

Chromatin immunoprecipitations. ChIP assays were conducted as described 
previously’. The polyclonal antibody against JHDM1B was used for ChIP ana- 
lysis of endogenous JHDM1B. anti-HA or anti-GFP antibodies were used as 
controls. Primer sequences for rDNA were as reported'*. GAPDH sequences 
were: 5'-TCCACCACCCTGTTGCTGTA-3’ and 5'-ACCACAGTCCATGCC- 
ATCAC-3’. 

Cell proliferation and FACS analysis. HeLa cells were transfected with siRNA 
oligonucleotides for two consecutive days, as described in the Methods 
Summary. At 48 h after the second transfection, an 80% confluent dish was 
trypsinized, split into five dishes and retransfected with siRNA oligonucleotides, 
in accordance with the long-term gene-silencing technique from the manufac- 
turer (Qiagen). Cells were collected daily for five days and analysed by standard 
counting methods. FACS and forward scatter analysis were conducted as 
described previously?*”>. 

Solubilized chromatin purification. Purification of solubilized chromatin frac- 
tions was performed as described previously”®. 

BrdU and BrUTP incorporation. BrdU incorporation was performed as 
described previously’’. For BrUTP incorporation, BrUTP (Molecular Probes) 
was added to a FUGENE 6 and 20-mM HEPES mixture (1:10) to a final concen- 
tration of 1mM for 15min at 25°C, as described by Roche Molecular 
Biochemicals. During this short period, BrUTP specifically highlights transcrip- 
tionally active nucleoli, and no BrUTP incorporation is observed outside the 
nucleoli’®”’. Slides were washed in PBS and the BrUTP—FuGENE 6 mixture was 
added to cells. Cells were incubated at 4 °C for 15 min, washed briefly in PBS and 
then incubated at 37 °C in culture medium for 5 min. Cells were processed for 
immunofluorescence analysis, as stated above, and an anti-BrdU antibody was 
used for detection. The BrdU and nucleolar-BrUTP-positive cells were analysed 
with standard counting methods. 

Data mining. Gene expression data on JHDM1A and JHDM1B were retrieved 
from the Oncomine website (http://www.oncomine.org) and the National 
Institutes of Health (NIH) Gene expression omnibus (GEO) (http:// 
www.ncbi.nlm.nih.gov/geo/). Data from two brain cancer studies were used 


nature 


for statistical calculations. Bredel et al.’ includes analysis of 50 human gliomas 
of various histogenesis with the use of CDNA microarrays. Data were reanalysed 
in GraphPad software to show expression levels of JHDM1B and JHDMIA. 
Additional details of the study, including the pathological and clinical data, 
are available at Oncomine or on the Cancer Research journal website. Data from 
the second study were from an analysis of grade III and IV gliomas of various 
histological types”. These data were gathered as part of the NIH Neuroscience 
Microarray Consortium (http://arrayconsortium.tgen.org). Data were acquired 
and reanalysed with a signal value threshold of 15,000, using GraphPad software 
to show expression levels of JHDM1B and to determine P and one-tailed t-test 
values. 

In vitro histone demethylation assay. In vitro demethylation assays were con- 
ducted as described previously*’. In brief, core histones were incubated with 
Flag-immunopurified JHDM1B from 293T cells in demethylation buffer 
(50mM Tris-HCl pH 8.0, 50mM KCl, 10mM MgClL, 1mM «-oxoglutarate, 
40mM FeSO,, 2mM ascorbic acid) at 37 °C. Core histones (2 lig) were incu- 
bated for 30 min with Flag-immunopurified JHDM1B in a volume of 30 ul. 
Reaction mixtures were analysed by western blotting. Quantification by densi- 
tometry was performed with ImageJ (NIH) software. 

FRAP (fluorescence recovery after photobleaching) analysis. HeLa cells were 
plated, transfected with JHDM1B-—pEGFP-N1 (Clontech Laboratories) by 
FuGENE 6 (Roche) and observed in LabTek II chambers (Nalgene). Selective 
photobleaching of the nucleolus was conducted on an LSM 510 microscope 
(Zeiss) with laser excitation at 488 nm for GFP and a 63x, 1.2 numerical aper- 
ture, oil-immersion objective, as described previously*’. JHDM1B-GFP fluor- 
escence for each image in the sequence was determined with Image] software 
(NIH). 
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ob mobility is a key trait for anyone pursuing a career in science. A willingness 
to switch disciplines, change labs, relocate to a different country or move 
between the worlds of academia and business can be crucial to success. But 
for many, that last transition can prove to be fairly problematic. 

In Britain, for example, biomedical scientists face several barriers should they 
consider moving from academia to industry, according to a report from the 
Academy of Medical Sciences (www.acmedsci.ac.uk). The problem is fuelled in 
part by fears among academics that a move into business will rob them of some 


autonomy and divorce them from academic networks. This in turn is caused 

by alack of information — both about what life in business is like and about the 
opportunities that exist in the industrial sector. Among the solutions to this 
information gap, the report suggests organizing introductory programmes to allow 


academics to gain industrial experience, and an increase in ‘industry open days’ at 


universities. 


The importance of these sectors getting to know one another better is emphasized 


by the Science, Technology and Industry Scoreboard released by the Organisation 


for Economic Co-operation and Development (OECD) last month. This shows that 
3.9 million people in OECD countries were working in research and development 


in 2005. Of these, a significant proportion were in the business sector: 80% in the 


United States, 66% in Japan and 50% in the European Union. And in China, the 
number of researchers working in the business sector has risen by 15% per year for 


the past five years. 


But the OECD figures also reveal that flexibility and mobility between disciplines 
and across borders are growing in importance. There were five times more scientific 
papers with international co-authors in 2005 compared with 1985. This globalization 
of science means that young researchers have greater opportunities. It offers them 
broader scope for finding their favoured living conditions and ideal collaborations 


— as long as they have the wherewithal to make the move. 


Gene Russo, acting editor of Naturejobs 
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MOVERS 


Alan Bernstein, executive director, Global HIV 
Vaccine Enterprise, Seattle, Washington 


2000-07: President, 
Canadian Institutes of Health 
Research, Toronto, Canada 
1994-2000: Director of 
research, Samuel Lunenfeld 
Research Institute of Mount 
Sinai Hospital; professor of 
molecular & medical genetics, 
University of Toronto 
1984-1994 Head of molecular 
and developmental biology, 
Samuel Lunenfeld Institute 


> 

i 
Alan Bernstein made sure he would never be pigeonholed 
into one discipline — a move that enabled the would-be 
physicist to make his mark in the biomedical sciences. 

As an undergraduate in maths and physics at the 
University of Toronto, he stumbled across the medical 
biophysics department. There he met Harold Johns, 
discoverer of cobalt-60 radiation therapy for cancer, who 
lured him into the burgeoning field of medical biophysics. 

For his PhD, Bernstein worked on mice with defective 
blood-cell production before realizing that he wanted a 
solid background in genetics. “Genetics is to biology what 
mathematics is to physics,” he says. But, he admits, he had 
no idea that a revolution in biology was imminent. 

In London as a postdoc, Bernstein worked on retroviruses 
at the Imperial Cancer Research Fund. An experiment he did 
there provided genetic evidence of the first retroviral 
oncogene, dubbed src. Back in Toronto, he determined how 
the Friend leukaemia virus affects the ability to form new 
blood cells. His was one of the first labs both to clonea 
retrovirus and to exploit it as a vector to deliver genes to cells. 

In 1984, Bernstein was asked to head the division of 
cancer research at the newly formed Samuel Lunenfeld 
Research Institute at Mount Sinai Hospital in Toronto. 
Instead, he chose to lead the institute's division of 
molecular and developmental biology, a field he expected 
would be hot. He spent the next 15 years, including time as 
Lunenfeld’s director, demonstrating how oncogenes control 
developmental processes. 

“Unlike many scientists, Alan doesn't have a comfort 
zone," says long-time colleague Janet Rossant, researcher 
at the Hospital for Sick Children in Toronto. “He's always 
looking for new scientific directions.” As inaugural 
president of the Canadian Institutes of Health Research, 
Bernstein built a US$1-billion budget combining molecular 
biology and social-science research agendas. 

In January, Bernstein will become the first executive 
director of the $750-million Global HIV Vaccine Enterprise, 
sponsored by the Bill & Melinda Gates Foundation. He 
takes on the directorship at a crucial time. Merck recently 
announced that it was dropping the development of what 
had been considered the most promising HIV vaccine 
candidate to date. Bernstein believes the way forward is to 
develop creative routes for collaborations by the vaccine 
enterprise, scientists and companies in the field. “No group 
can do this alone,” says Rossant. a 
Virginia Gewin 
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Latin American challenges 


Young Latin American biomedical 
scientists seeking postdoctoral 
positions abroad face significant 
funding challenges. True, top 
graduates from developing countries 
can often obtain fellowships to work 
abroad from their countries’ national 
research councils or equivalent 
organizations. But the stipends are 
usually lower than those offered by 
prestigious programmes such as the 
European Molecular Biology 
Organization, NATO or the US 
National Academies. And those three 
programmes do not accept 
applications from Latin American 
citizens. 

Fortunately, a few highly selective 
international programmes fund Latin 
American citizens. Two examples are 
the Human Frontier Science Program 
(HFSP) and the Pew Latin American 
Fellows Program, both coordinated by 
Nobel laureate Torsten Wiesel, 
president emeritus at Rockefeller 
University, New York. These not only 
support postdoctoral training but also 
help young scientists return home. 

The difficult transition from postdoc 
to independent investigator is well 
known. It can be especially hard for 
scientists returning to developing 
countries , where doing research can 
be frustrating. In Brazil, for example, 
they face a long wait for reagents that 


often cost more than twice the price 
of those ordered in the United States. 
Just bringing in scientific equipment 
and reagents can be a nightmare 

(S. K. Rehen Nature 428, 601; 2004). 

As developing countries are short 
of money, support from programmes 
such as the HFSP or Pew can make a 
world of difference. Pew provides 
funds for equipment and supplies 
to set up a laboratory in the country 
of origin. The HFSP emphasizes 
interdisciplinary research and focuses 
on young researchers’ careers, 
offering competitive awards after 
they return home. 

Science in Latin America has grown 
vigorously in the past decade. The 
Latin American share of the world’s 
scientific publications increased from 
1.8% in 1991-95 to 3.4% in 1999- 
2003 (M. Hermes-Lima et al. [UBMB 
Life 4, 199-210; 2007). But its 
visibility is low, indicating a need to 
increase the quality of science in 
general and biomedical sciences in 
particular. Supporting the initial 
phases of independent careers is 
crucial, not only for competitiveness 
but to prevent a brain drain. | 
Fernanda De Felice is a postdoctoral 
fellow at Northwestern University in 
Evanston, Illinois, and an assistant 
professor at the Federal University of 
Rio de Janeiro, Brazil. 
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Johannesburg, South Africa. 


Meet my lab minion 


There comes a point in your academic career when you can employ minions 

to do menial lab work for you. Sadly, | have yet to reach that point. When | 

need someone to spend most of their working hours sitting ina magnetically 
shielded laboratory, coaxing the ancient magnetic signals from three-billion- 
year-old South African lavas, there's only one person that | can delegate the 
task to: me. Unfortunately, it's not just a matter of placing my samples in a fancy 
machine and instantly getting the data! want; every sample has to be measured 
a dozen times or more, as | attempt to destabilize unwanted overprints with 
ever-stronger blasts of heat or artificial magnetic fields. 

Weeks of repetitive measurements might not sound that intellectually 
stimulating. But | actually enjoy getting my hands dirty in the lab — | take great 
satisfaction from the thought that I’m personally uncovering long-buried clues 
about southern Africa's geological history. Besides, I'm not simply repeating 
my PhD research. I'm using an unfamiliar machine to analyse ancient lavas 
with magnetic quirks and subtleties quite different from the much younger 
sediments I've studied before. Even now, I'm still learning the tricks of my 
scientific trade, and gaining the broader experience necessary to supervise 
my future minions effectively, should | ever be given any. a 
Chris Rowan is a postdoctoral student in the geology department at the University of 
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A hand and honour 


Keeping pace with the human race. 


Brenda Cooper 


John Justice stretched up, fingers scrap- 
ing at cool morning air, then bent down, 
cupping his calves, the nanskin registering 
his fingertips as data points: pressure, heat, 
sweat, angle. 

The hum of the crowd, the band’s drums 
and wind instruments, and even the race 
announcer seemed far, far away. He already 
knew what medals felt like. Before his turn 
in the never-ending war, his men’s relay 
team won gold in London. 

Last month, hed killed the world record 
for the 10,000-metre run, coming in at just 
over 24 minutes. No medal for that. Twenty 
or so news stories, a political cartoon or 
two, and a combination of joy and bitter- 
ness sticking so deep in his gut he threw up 
all over the course when he was done. 

Today, his race would be one-on-one 
against the man whose record still stood 
even after John beat it. Hsui Smith, an 
improbably tall Chinese-American who 
held the world record in the 10,000 metres. 
Who would still hold the official record, 
even after today. 

Discrimination was a bitch. Change was 
tested for like steroids. 

John nearly jumped as his coach, Nico- 
lai, placed his metal hand on the small of 
his back. “Dont think about it. Just run. 
Run for all of us.” 

It was nearly time. “Tll win,” he nodded 
at Nicolai, forcing a smile, staring into the 
shorter, blockier man’s deep brown eyes. 
Nic’s naked hope made him clap the man 
on the back. “I know it matters.’ 

Nicolai headed for the finish line. As 
the noise and movement swallowed Nic, 
John muttered, “Damned exhibition.” He 
had always yearned to be the fastest man 
in the world. The best that war-wounded- 
John could become for child-John was the 
fastest un-man. 

Kim Moon waited for him on the way 
to the starting blocks, looking more like 
a debutante than an engineer-medic, her 
figure slim and curvy in a one-piece shorts 
outfit. She reached up and hugged him. 
“Good luck.” 

He didnt have to fake a smile for her. “It’s 
all your fault” 

“They're your legs,” she retorted. “The 
best P’ve ever made.” 

One of her customers had new hands and 
feet with built in temperature controls, and 
had climbed Everest and K2. After an arti- 
ficial hand replaced one eaten by frostbite, 
the climber had made news by chopping 


off the functioning hand for another of 
Kim’ sculptures. 

Without Kim, he would have walked, 
and run, but never raced. She was all the 
magic of maths and engineering held 
together with heart. He leaned down and 
kissed her forehead, savouring her honey- 
suckle scent. 

As John approached the starting blocks, 
Hsui stood up from a hamstring stretch 
and extended a hand. John took it. Where 
he'd expected to see challenge in the noto- 
riously cocky runner’s eyes, he swore he 
saw fear. His nerves screamed at it. “Why 
did you agree to this?” 


Hsui shook John’s hand, replying softly, 
“My brother lost a hand in the war.” He let 
go and turned to his starting blocks. 

“Thank you,’ John said to his back. 

John swept Hsui’s fears, and his own, 
into a deep breath and puffed them out, 
relaxing his cheeks. He rocked a bit, set- 
ting his calves, running a quick mental skip 
across the sensors in his skin, checking 
the breeze, temperature and humidity. He 
struggled to close his ears as the announcer 
droned on. Kim’s legs — his — wouldn't 
win by themselves. He mentally shrank 
the world to a bubble around him, and the 
long slender corridor of space on the track 
in front of him. 

The starting gun swept him forward, 
following Hsui. 

He fell in right behind, body straight, 
arms pumping. 
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No need to pass. 

Yet. 

He let the first round of the track go, cali- 
brating, biding time. His legs were all he 
had, hed refused changes to his lungs and 
circulatory system, wanting some purity. 
Important not to overrun his breath. 

He was about to pass the fastest human 
ever. The fastest pure human. He threw 
the thought away. A break in stride or a 
stumble could steal the race. Counting and 
breathing and moving. Just the track under 
him and the narrow corridor, the wind on 
his teeth. Breath and wind and stride and 
arms. 

His head turned a little, as if the force of 
Hsui’s run called it. Hsui didn’t return John’s 
darting glance, just kept going, head up. 
Surging. To match him, John told his legs 
to give more, asked his heart to keep up. 

Breath and wind and spine and floor. 
Data instead of Hsui’s desperate face. 

Another turn around the track, a 
matched pair. 

The image of two feet crossing at the 
same time raced through John’s head. 
An honourable outcome. Except he was 
a racer. The sound of Hsui’s breath fell to 
behind John’s shoulder. The finish line 
blurred under him. 

Nic’s arms encircled him. Kim leapt up 
on Nic’s back. Nic grabbed her under the 
knees, boosting her like a child. She looked 
down, her joy at the win overtaken by a 
crease in her brow. “Why so slow?” 

He shook his head, unsure how to 
explain it. “ll be right back” Hsui jogged 
well past him now, sweat dripping down 
his back. 

John caught him. “I hope your brother 
is proud of you.” 

Hsui winced. “He went back to the war. 
They put him in special ops ‘cause his 
hand-eye coordination was so much better 
than anyone else’s.” He looked away. “After 
his enhancements his hand was steadier 
than anybody else's” 

Hsui had lost face to honour a brother 
with no more change than a hand? A man 
who had done well for himself? Hsui con- 
tinued. “He's dead. They gave him a purple 
heart.” He turned, and without so much 
as a smile, the fastest man in the world 
walked away from the fastest un-man in 
the world. a 
Brenda Cooper is a futurist, a writer and 
the chief information officer for the City 
of Kirkland, Washington. Her latest 
novel is The Silver Ship and the Sea from 
Tor Books. 
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